Minimal Generators in Optimal Time

Ellert, Jonas; Gawrychowski, Paweł; Starikovskaya, Tatiana

doi:10.4230/LIPIcs.CPM.2025.14

Minimal Generators in Optimal Time

Jonas Ellert

DIENS, École normale supérieure de Paris, PSL Research University, France Paweł Gawrychowski

Institute of Computer Science, University of Wrocław, Poland Tatiana Starikovskaya

DIENS, École normale supérieure de Paris, PSL Research University, France

Abstract

A walk of length $n$ on a string $S$ of length $m$ is a function $f:\{1,\dots,n\}\rightarrow\{1,\dots,m\}$ such that $\forall i\in\{2,\dots,n\}:\left\lvert f(i)-f(i-1)\right\rvert\leq 1$ . The walk generates the string $T$ of length $n$ defined by ${\forall i\in\{1,\dots,n\}:T[i]=S[f(i)]}$ . Intuitively, this can be seen as walking $n$ steps in $S$ and outputting the encountered symbols, where in each step we either remain at the same position, or move one position to the left or to the right. The minimal generator of a string $T$ is the shortest string $S$ such that a walk on $S$ generates $T$ . Recently, it was shown that each string admits exactly one (up to reversal) minimal generator (Pratt-Hartmann, CPM 2024). However, no efficient algorithm for computing the minimal generator was known. We provide an optimal algorithm for this task, taking ${\mathcal{O}}(n)$ time for a string of length $n$ over general unordered alphabet, i.e., accessing the string only by equality comparisons of symbols. The main challenge is to detect substrings of the form $axb\tilde{x}axb$ and replace them with $a x b$ , where $a, b$ are symbols and $x$ is a string with reversal $\tilde{x}$ . We solve this problem with a non-trivial adaptation of Manacher’s classic algorithm for computing maximal palindromic substrings (Manacher, J. ACM 1975). To obtain the final algorithm, we solve small subinstances of the problem in optimal time by adapting the “Four Russians” technique to strings over general unordered alphabet, which may be of independent interest.

Keywords and phrases:

string algorithms, walking on words, minimal generator, palindromic substrings, general unordered alphabet, decision tree complexity

Funding:

Jonas Ellert: Partially supported by grant ANR-20-CE48-0001 from the French National Research Agency (ANR).

Paweł Gawrychowski: Partially supported by the Polish National Science Centre grant number 2023/51/B/ST6/01505.

Tatiana Starikovskaya: Partially supported by grant ANR-20-CE48-0001 from the French National Research Agency (ANR).

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Pattern matching

DOI:

10.4230/LIPIcs.CPM.2025.14

Event:

36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Editors:

Paola Bonizzoni and Veli Mäkinen

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction and Related Work

Given some non-empty string $S$ of length $m$ , go to any position in $S$ . Now repeat the following routine $n$ times. Print the symbol at the current position, and then either stay at that position, or move one position to the left (if possible), or move one position to the right (if possible). The result is a string $T$ of length $n$ written in an append-only manner. Pratt-Hartmann describes this procedure as going for a walk on $S$ [22], and he formally models the walk as a function $f:\{1,\dots,n\}\rightarrow\{1,\dots,m\}$ that satisfies $\forall i\in\{2,\dots,n\}:\left\lvert f(i)-f(i-1)\right\rvert\leq 1$ . Using this function, $T$ is defined by $\forall i\in\{1,\dots,n\}:T[i]=S[f(i)]$ . Intuitively, we say that some string $S$ generates another string $T$ (or that $S$ is a generator of $T$ ) if and only if there is a (not necessarily unique) walk on $S$ that results in $T$ .

While a string may have many distinct generators, a natural question is to ask for a generator of minimal length, henceforth called a minimal generator.¹¹1In [22, 21], a minimal generator is called a primitive generator. We use different terminology to avoid confusion with the other definition of primitiveness (where a string is primitive if it cannot be written as the $k$ times concatenation of a shorter string for some integer $k$ ). Pratt-Hartmann shows that the minimal generator of a string is unique up to reversal. While he does not consider the problem of efficiently computing the minimal generator of a given string, he implicitly provides a simple algorithmic idea for doing so. Particularly, he shows that the minimal generator of $T$ can be obtained by starting with $T$ and then repeatedly substituting $(i)$ substrings of the form $a a$ with $a$ , $(ii)$ substrings of the form $axb\tilde{x}axb$ with $a x b$ , and $(iii)$ prefixes (resp. suffixes) of the form $axb\tilde{x}a$ with $b\tilde{x}a$ (resp. $a x b$ ). Here, $a, b$ are symbols and $x$ is a possibly empty string with reversal $\tilde{x}$ . As soon as no further substitutions are possible, we have obtained the minimal generator. We provide an ${\mathcal{O}}(n)$ -time algorithm that computes a minimal generator via the described substitutions. It works over general unordered alphabet, i.e., it accesses the input string only via equality comparisons of symbols.

Theorem 1.

There is an algorithm that computes the minimal generator of a length- $n$ string over general unordered alphabet in ${\mathcal{O}}(n)$ time.

1.1 Technical Overview and Roadmap

In Section 2, we introduce the formal definitions and notation used throughout the paper, and then show that substitutions of types $(i)$ and $(iii)$ are easy to realize. The remainder of the paper focuses on substitutions of type $(ii)$ .

In Section 3, we provide a simple algorithm that performs all substitutions of type $(ii)$ in ${\mathcal{O}}(n\log n)$ time. It follows the same divide-and-conquer strategy as Main and Lorentz’s classic algorithm for detecting squares [18]. First, we consider the halves $T_{1}=T[1\operatorname{{.}\,{.}}\left\lfloor\frac{n}{2}\right\rfloor]$ and $T_{2}=T[\left\lfloor\frac{n}{2}\right\rfloor+1\operatorname{{.}\,{.}}n]$ and recursively perform all type- $(ii)$ substitutions within them. Then, any remaining substitution in $T_{1}T_{2}$ must cross the boundary between $T_{1}$ and $T_{2}$ . Using a modified version of Manacher’s algorithm for computing maximal palindromic substrings [19], we find and perform all substitutions at the boundary in linear time.

In Section 4, we improve the time to ${\mathcal{O}}(n\log^{*}n)$ .²²2 $\log^{*}n$ denotes the extremely slowly growing iterated logarithm defined by $\log^{*}n=0$ if $n\leq 1$ and $\log^{*}n=1+\log^{*}\log n$ otherwise. Instead of splitting $T$ into halves, we split it into ${\mathcal{O}}(n/\log^{2}n)$ blocks of length ${\mathcal{O}}(\log^{2}n)$ . We recursively perform the type- $(ii)$ substitutions in each block. Then, a more sophisticated adaptation of Manacher’s algorithm combines the reduced blocks and performs the remaining substitutions in linear time.

In Section 5, we finally achieve linear time by modifying the ${\mathcal{O}}(n\log^{*}n)$ time algorithm. Once the recursion of the algorithm reaches blocks of size $s=o(\log\log\log n)$ (which happens after a constant number of levels of recursion), we stop the recursion and solve the blocks of the current level in optimal time by using precomputed data structures. Particularly, by brute force, we precompute a decision tree of minimal height for solving blocks of size $s$ . Peculiarly, this results in a time-wise optimal solution, but we do not immediately know the time complexity of this solution. We show that the height of the decision tree is indeed ${\mathcal{O}}(s)$ in Appendix A. Hence we solve instances of size $s$ in linear time, as required. A similar technique has been used in [20, 25].

1.2 Related Work

Optimum Stack Generation.

In [23], Tarjan introduced the optimum stack generation problem, where one is given a finite alphabet $\Sigma$ , a stack, and a string $\alpha\in\Sigma^{n}$ , and must find the shortest sequence of stack operations – push, print the top character in the stack, and pop – printing $\alpha$ . The stack must be empty in the beginning and at the end of the algorithm. While the problem seems to be quite similar to the problem of computing a minimal generator, no fast algorithms are known. A simple dynamic programming algorithm solves it in ${\mathcal{O}}(n^{3})$ time, and the current best solution is by Bringmann et al. [6] and requires $\tilde{{\mathcal{O}}}(n^{2.8244})$ randomised time or $\tilde{{\mathcal{O}}}(n^{2.8603})$ deterministic time on a constant-size alphabet.

Repetition Detection in Dynamic Strings.

Our algorithm repeatedly replaces substrings of the form $axb\tilde{x}axb$ with $a x b$ , where $a, b$ are symbols and $x$ is a string. In other words, it must be able to detect substrings of form $axb\tilde{x}axb$ in a dynamic string, where one is allowed to update the input by deleting its substrings. The problem of detecting repetitive structures in a dynamic string has been an active branch of research recently. The problem of maintaining the longest palindromic substring was considered in [1, 24, 13, 3], where the update time of the current best algorithm by Amir et al. [3] is $\tilde{{\mathcal{O}}}(\sqrt{n})$ . The longest square of a dynamic string can be maintained in $\operatorname{\textnormal{polylog}}(n)$ time per update [7], while maintaining the runs can be done in $n^{o(1)}$ time [2].

Repetition Detection and Other Problems Over Unordered Alphabets.

Our algorithm accesses the string via equality comparisons of symbols, i.e., we do not assume an order on the alphabet. Since order is not needed to define repetitive structures (like palindromes, squares, or runs), it is an intriguing question whether the presence of order facilitates the detection of such structures. We start with some examples where order is beneficial. Squares and runs can be detected in linear time over an ordered alphabet [10], whereas there is a tight bound of $\Theta(n\log\sigma)$ symbol comparisons to decide square-freeness of a string over unordered alphabet of size $\sigma$ [12]. There is an ${\mathcal{O}}(n\log\sigma)$ -time algorithm for reporting all runs on such a string [9, 11]. Computing the Lempel-Ziv factorization has a tight time bound of $\Theta(n\log\sigma)$ for ordered alphabet [17], and $\Theta(n\sigma)$ for unordered alphabet [9, 12]. On the other hand, there are linear time algorithms for several problems over unordered alphabet, e.g., pattern matching in one [15] and two [14] dimensions, detecting the longest palindromic substring [19], computing an unbordered conjugate [8], finding the leftmost critical factorization [16], and (as shown in this paper) computing a minimal generator.

2 Preliminaries

Intervals, Alphabets, Strings.

For $i,j\in\mathbb{Z}$ , we write $[i,j]=[i,j+1)=(i-1,j]=(i-1,j+1)$ to denote the integer range $\{i,i+1,\dots,j\}$ (or the empty set if $i>j$ ). Let $n\in\mathbb{N}$ . A string $T$ of length $\left\lvert T\right\rvert=n$ is a sequence $T[1]T[2]\dots T[n]$ of symbols from an alphabet $\Sigma$ (or the empty string $\epsilon$ if $n=0$ ). We simply write $T[1\operatorname{{.}\,{.}}n]$ to denote that $T$ is a string of length $n$ . Every pair $i,j\in[1,n]$ defines a substring $T[i\operatorname{{.}\,{.}}j]=T[i\operatorname{{.}\,{.}}j+1)=T(i-1\operatorname% {{.}\,{.}}j]=T(i-1\operatorname{{.}\,{.}}j+1)=T[i]T[i+1]\dots T[j]$ (which equals $\epsilon$ if $i>j$ ). Index $i$ is the position of substring $T[i\operatorname{{.}\,{.}}j]$ . A substring $T[1\operatorname{{.}\,{.}}i]$ is a prefix of $T$ , while a substring $T[i\operatorname{{.}\,{.}}n]$ is a suffix of $T$ . The reversal of $T$ is defined as $\tilde{T}=\textnormal{{rev}}(T)=T[n]T[n-1]\dots T[1]$ . Another string $S[1\operatorname{{.}\,{.}}m]$ is a substring of $T$ if there is $i\in[1,n-m+1]$ such that $S[1\operatorname{{.}\,{.}}m]=T[i\operatorname{{.}\,{.}}i+m)$ , i.e., $\forall j\in[1,m]:T[i+j-1]=S[j]$ . From now on, we use $T$ and $S$ to respectively denote the input and output string, and $\alpha,\beta,\gamma,\delta$ to denote intermediate strings. We use the terms string and array interchangeably.

Palindromes, i.e., strings that satisfy $\alpha=\tilde{\alpha}$ , are essential for the study of minimal generators. A fundamental tool for detecting palindromic substrings is Manacher’s algorithm [19]. It computes, for each position $i\in[1,n]$ of $T$ , the maximal radius $\ell\in[0,\min(i-1,n-i)]$ such that $T[i-\ell\operatorname{{.}\,{.}}i+\ell]$ is a palindrome in overall ${\mathcal{O}}(n)$ time. Note that, while not strictly required, familiarity with the details of Manacher’s algorithm will be beneficial for understanding the new algorithm for computing minimal generators.

Model of Computation.

An oracle string $T[1\operatorname{{.}\,{.}}n]$ can only be accessed via the following type of oracle query. Given query positions $i,j\in[1,n]$ , the oracle answers if $T[i]=T[j]$ . Algorithms that receive an oracle string as input are more commonly described as working over general (unordered) alphabet, or as accessing the input via equality comparisons. We present algorithms that (apart from the oracle queries) work in the word RAM model, i.e., they perform an interleaved sequence of oracle queries and word RAM operations. The time complexity of an algorithm is the sum of the number of oracle queries and the number of word RAM operations performed. The size of a machine word is assumed to be at least $\log_{2}n$ bits.

Since an oracle string has no physical representation, it is not obvious how to perform traditional string operations like extracting, deleting, or overwriting substrings. We allow such operations by wrapping the input oracle string $T[1\operatorname{{.}\,{.}}n]$ as follows. Every string $S[1\operatorname{{.}\,{.}}m]$ encountered during the algorithm execution (e.g., a substring extracted from $T$ or the computed output string) is physically represented by an array $A\in[1,n]^{m}$ , indicating that $\forall i\in[1,m]:S[i]=T[A[i]]$ . This also holds for the input string $T[1\operatorname{{.}\,{.}}n]$ itself, which is represented by $B[1\operatorname{{.}\,{.}}n]$ with $\forall i\in[1,n]:B[i]=i$ . Hence we can perform operations that modify or create copies of (sub)strings directly on the physical representations. Given the physical representations, we can still perform any symbol comparison in constant time with a single oracle query to $T[1\operatorname{{.}\,{.}}n]$ . We hide this additional level of complexity from now on.

Walking on Strings.

Let $S[1\operatorname{{.}\,{.}}m]$ be a string, and let $n\in\mathbb{N}^{+}$ . Then $f:[1,n]\rightarrow[1,m]$ is a walk on $S$ if and only if $\forall i\in[2,n]:\left\lvert f(i)-f(i-1)\right\rvert\leq 1$ . The walk $f$ on $S$ generates the string $T[1\operatorname{{.}\,{.}}n]$ defined by $\forall i\in[1,n]:T[i]=S[f(i)]$ . More generally, a string $S$ generates another string $T$ if and only if there is a walk on $S$ that generates $T$ . We further say that $S$ generates $T$ from left to right if and only if there is a walk $f$ on $S$ that generates $T$ with $f(1)=1$ and $f(\left\lvert T\right\rvert)=\left\lvert S\right\rvert$ . Trivially, if $S$ generates $T$ from left to right, then $\left\lvert S\right\rvert\leq\left\lvert T\right\rvert$ . The following properties are straight-forward. (The first one is obtained by composing the two generating walks, the second one by concatenating them.)

Observation 2 (Properties of Left-to-Right Generation).

Let $\alpha,\beta,\gamma,\delta$ be strings.

(i)

Transitivity: If $\alpha$ generates $\beta$ from left to right, and $\beta$ generates $\gamma$ from left to right,
then $\alpha$ generates $\gamma$ from left to right.
(ii)

Concatenability: If $\alpha$ generates $\beta$ from left to right, and $\gamma$ generates $\delta$ from left to right,
then $\alpha\gamma$ generates $\beta\delta$ from left to right.

Minimal Generators.

A minimal generator of $T$ is a string that generates $T$ and is of minimal length (i.e., there is no shorter generator). The minimal generator of $T$ is unique up to reversal [22]. Whether or not a generator is minimal can be characterized by using the following types of substrings. A (non-trivial) palindrome is a string $y$ (of length at least two) such that $y=\tilde{y}$ . A twin-palindrome is a string of the form $axb\tilde{x}axb$ for symbols $a, b$ and a string $x$ . A unit-square is a string of the form $a a$ for a symbol $a$ .

Lemma 3 ([22, Lemma 6]).

A string $S$ is a minimal generator of another string $T$ if and only if $S$ generates $T$ and both of the following conditions hold. No suffix or prefix of $S$ is a non-trivial palindrome. No substring of $S$ is a unit-square or a twin-palindrome.

Particularly, a minimal generator can be obtained from $T$ by repeatedly applying the following substitutions until no further substitutions are possible. (The correctness is straight-forward, see Observation 2 and the proof of [22, Lemma 6]).

Lemma 4.

Let $\alpha,\beta,\gamma,x$ be strings, and let $a, b$ be symbols.

(i)

If $\alpha\cdot aa\cdot\beta$ generates $\gamma$ , then so does $\alpha\cdot a\cdot\beta$ .
If $\alpha\cdot aa\cdot\beta$ generates $\gamma$ from left to right, then so does $\alpha\cdot a\cdot\beta$ .
(ii)

If $\alpha\cdot axb\tilde{x}axb\cdot\beta$ generates $\gamma$ , then so does $\alpha\cdot axb\cdot\beta$ .
If $\alpha\cdot axb\tilde{x}axb\cdot\beta$ generates $\gamma$ from left to right, then so does $\alpha\cdot axb\cdot\beta$ .
(iii)

If $axb\tilde{x}a\cdot\alpha$ generates $\gamma$ , then so does $b\tilde{x}a\cdot\alpha$ .
If $\alpha\cdot axb\tilde{x}a$ generates $\gamma$ , then so does $\alpha\cdot axb$ .

Our algorithm for computing the minimal generator performs the following three steps. First, we eliminate all unit-squares by applying Lemma 4(i) until no longer possible. Second, we eliminate twin-palindromes by applying Lemma 4(ii) until no longer possible, which introduces no new substrings of length two, and thus no new unit-squares. Third, we eliminate non-trivial suffix and prefix palindromes by applying Lemma 4(iii) until no longer possible. This introduces no new substrings, and thus the string remains free of unit-squares and twin-palindromes. The first and third step are easily performed using the auxiliary lemmas below. In the remainder of the paper, we focus on the elimination of twin-palindromes in the second step, for which we make the following observation.

Observation 5.

Let $k\in\mathbb{N}^{+}$ . A string $\alpha[1\operatorname{{.}\,{.}}3k+1]$ is a twin-palindrome if and only if $\alpha[1\operatorname{{.}\,{.}}2k+1]$ and $\alpha[k+1\operatorname{{.}\,{.}}3k+1]$ are palindromes (which justifies the term “twin-palindrome”).

Lemma 6.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string. There is an algorithm that, in ${\mathcal{O}}(n)$ time, computes a string $S$ that generates $T$ from left to right and contains no unit-squares.

Proof.

We start with $\beta=\alpha[1]$ . For each $i\in[2,n]$ in increasing order, we append $\alpha[i]$ to $\beta$ if and only if $\alpha[i]\neq\alpha[i-1]$ , which clearly takes ${\mathcal{O}}(n)$ time and yields the desired result. $\hfill\blacktriangleleft$

Lemma 7.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string that contains neither unit-squares nor twin-palindromes. There is an algorithm that, in ${\mathcal{O}}(n)$ time, computes a minimal generator of $T$ .

Proof.

We start with a copy of $T$ and modify it as follows. As long as there is a non-trivial prefix palindrome $xa\tilde{x}$ , we replace it with $a\tilde{x}$ . Then, as long as there is a non-trivial suffix palindrome $xa\tilde{x}$ , we replace it with $x a$ . (In both cases, $a$ is a symbol and $x$ is a non-empty string.) This results in a generator of $T$ by Lemma 4(iii). Note that neither of the replacements introduces any new substrings because they merely truncate the string. Hence we introduce no new unit-squares or twin-palindromes, and the resulting string must be a minimal generator of $T$ by Lemma 3.

We will find a non-trivial prefix or suffix palindrome $xa\tilde{x}$ in ${\mathcal{O}}(x)$ time, or determine that none exists is ${\mathcal{O}}(n)$ time. Whenever we spend ${\mathcal{O}}(x)$ time to discover a prefix or suffix palindrome, we also reduce the length of the string by $x$ , and thus the time sums to ${\mathcal{O}}(n)$ .

It remains to show how to find a prefix palindrome $xa\tilde{x}$ . We proceed in up to $\left\lceil\log_{2}n\right\rceil$ phases. In phase $j$ (starting with $j=1$ ), we aim to find a prefix palindrome of length in range $(2^{j-1},2^{j}]$ . We run Manacher’s algorithm [19] and find the longest prefix palindrome of $T[1\operatorname{{.}\,{.}}\min(n,2^{j})]$ in ${\mathcal{O}}(2^{j})$ time. If it is non-trivial, then it is of length at least $2^{j-1}$ , since otherwise we would have discovered it in an earlier phase. Also, its length must be odd because $T$ is free of unit-squares (and even-length palindromes contain a unit-square at their center). As soon as we find a non-trivial prefix palindrome, we report it and perform no further phases. If we discover a non-trivial prefix palindrome in phase $j$ , then the time sums to ${\mathcal{O}}(2^{j})$ , which is linear in the length of the palindrome. Otherwise, we finish all phases in ${\mathcal{O}}(n)$ time and there is no non-trivial palindromic prefix. The procedure for suffix palindromes is symmetric. $\hfill\blacktriangleleft$

3 Eliminating Twin-Palindromes in ${\mathcal{O}}(n\log n)$ Time

In this section, we show how to eliminate twin-palindromes that cross a fixed position. This ultimately results in the following theorem.

Theorem 8.

Let $k\in\mathbb{N}^{+}$ , and let $\alpha[1\operatorname{{.}\,{.}}n]$ , $\beta[1\operatorname{{.}\,{.}}m]$ be strings that contain no twin-palindromes of length at most $k$ . There is an algorithm that computes $n^{\prime}\in[0,n]$ and $m^{\prime}\in[0,m]$ such that

(i)

$\alpha[1\operatorname{{.}\,{.}}n-n^{\prime}]\beta(m^{\prime}\operatorname{{.}% \,{.}}m]$ generates $\alpha\beta$ from left to right, and
(ii)

$\alpha[1\operatorname{{.}\,{.}}n-n^{\prime}]\beta(m^{\prime}\operatorname{{.}% \,{.}}m]$ contains no twin-palindrome of length at most $k$ , and
(iii)

if $\alpha$ and $\beta$ contain no unit-squares, then neither does $\alpha[1\operatorname{{.}\,{.}}n-n^{\prime}]\beta(m^{\prime}\operatorname{{.}% \,{.}}m]$ .

The algorithm runs in ${\mathcal{O}}(k+n^{\prime}+m^{\prime})$ time.

The Theorem follows from a sequence of simpler results, which we describe in the remainder of the section. Before that, we show that the Theorem directly implies an ${\mathcal{O}}(n\log n)$ time algorithm for eliminating all twin-palindromes.

Corollary 9.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string. There is an algorithm that, in ${\mathcal{O}}(n\log n)$ time, computes a string $S$ that generates $T$ from left to right and contains neither twin-palindromes nor unit-squares.

Proof.

Assume that $T$ contains no unit-squares. This assumption is without loss of generality, as otherwise we can eliminate unit-squares with Lemma 6, and then show the correctness of the Corollary for the resulting shorter string. We terminate in constant time if $n=1$ . Otherwise, we recursively apply the Corollary to $T_{1}=T[1\operatorname{{.}\,{.}}\left\lfloor\frac{n}{2}\right\rfloor]$ and $T_{2}=T(\left\lfloor\frac{n}{2}\right\rfloor\operatorname{{.}\,{.}}n]$ , resulting in strings $T_{1}^{\prime}$ and $T_{2}^{\prime}$ that contain neither twin-palindromes nor unit-squares. Furthermore, $T_{1}^{\prime}$ generates $T_{1}$ from left to right, and $T_{2}^{\prime}$ generates $T_{2}$ from left to right. Hence, by Observation 2(ii), $T_{1}^{\prime}T_{2}^{\prime}$ generates $T$ from left to right. We apply Theorem 8 with $k=n$ to $T_{1}^{\prime}$ and $T_{2}^{\prime}$ . In ${\mathcal{O}}(n)$ time, we obtain $n^{\prime}\in[0,\left\lvert T_{1}^{\prime}\right\rvert]$ and $m^{\prime}\in[0,\left\lvert T_{2}^{\prime}\right\rvert]$ such that $S=T_{1}^{\prime}[1\operatorname{{.}\,{.}}n-n^{\prime}]T_{2}^{\prime}(m^{\prime% }\operatorname{{.}\,{.}}\left\lvert T_{2}^{\prime}\right\rvert]$ generates $T_{1}^{\prime}T_{2}^{\prime}$ (and by Observation 2(i) also $T$ ) from left to right. Furthermore, $S$ contains no unit-squares or twin-palindromes. We spend ${\mathcal{O}}(n)$ time per level of recursion, and ${\mathcal{O}}(n\log n)$ time overall. $\hfill\blacktriangleleft$

Now we prove Theorem 8. We start with an algorithm for finding a shortest twin-palindromic substring (or verifying that none exist) in linear time. At first glance, prioritizing short twin-palindromes might seem counter-intuitive; one might expect that long twin-palindromes should be eliminated early to quickly reduce the size of the input. However, finding long twin-palindromes appears to be more challenging, and finding the shortest one still suffices for proving Theorem 8.

Lemma 10.

Let $\alpha[1\operatorname{{.}\,{.}}n]$ be a string. There is an algorithm that returns the position and length of a shortest twin-palindromic substring of $\alpha$ , or reports that $\alpha$ contains no twin-palindromes, in ${\mathcal{O}}(n)$ time.

Proof.

We compute the maximal radius $R[i]$ of a palindrome centered at each $i\in[1,n]$ , i.e., $R[i]=\max\{h\in[0,\min(i-1,n-i)]\mid\textnormal{$\alpha[i-h\operatorname{{.}\,% {.}}i+h]$ is a palindrome}\}$ . A twin-palindrome must be of length $3k+1$ for some positive integer $k$ . Fix $k$ and consider a substring $\alpha[p\operatorname{{.}\,{.}}p+3k]$ . Observation 5 states that $\alpha[p\operatorname{{.}\,{.}}p+3k]$ is a twin-palindrome if and only if $\alpha[p\operatorname{{.}\,{.}}p+2k]$ and $\alpha[p+k\operatorname{{.}\,{.}}p+3k]$ are palindromes, which is the case if and only if $R[p+k]\geq k$ and $R[p+2k]\geq k$ . This dictates a simple algorithm for detecting all twin-palindromes. We consider each pair $\ell,r\in[1,n]$ with $\ell<r$ . Let $k=r-\ell$ and $p=\ell-k$ , which implies $\ell=p+k$ and $r=p+2k$ . If $R[\ell]\geq k$ and $R[r]\geq k$ , then we report $\alpha[p\operatorname{{.}\,{.}}p+3k]$ as a twin-palindrome. There are $\Theta(n^{2})$ pairs, and each pair is processed in constant time.

We cannot afford to consider each pair of positions. Hence we show that ${\mathcal{O}}(n)$ pairs are sufficient to detect a shortest twin-palindrome. Let $\alpha[p\operatorname{{.}\,{.}}p+3k]$ be any such twin-palindrome, i.e., $\alpha$ contains no twin-palindrome of length less than $3k+1$ . Let $\ell=p+k$ and $r=p+2k$ . We already know that $R[\ell]\geq k$ and $R[r]\geq k$ . We claim that $\forall m\in(\ell,r):R[m]<k$ . Indeed, $R[m]\geq k$ with $m\in(\ell,r)$ implies that $\alpha[\ell\operatorname{{.}\,{.}}2m-\ell]$ is a palindrome of length $2(m-\ell)+1$ centered at position $m$ . Due to $R[\ell]\geq k>m-\ell$ , we know that $\alpha[2\ell-m\operatorname{{.}\,{.}}m]$ is a palindrome of length $2(m-\ell)+1$ centered at position $\ell$ . Thus, Observation 5 implies that $\alpha[2\ell-m\operatorname{{.}\,{.}}2m-\ell]$ is a twin-palindrome of length $3(m-\ell)+1<3k+1$ . However, this contradicts the fact that $\alpha$ contains no twin-palindrome of length less than $3k+1$ .

Now we exploit the new insights algorithmically. For each position $i\in[1,n]$ , we compute $\textnormal{{prv}}[i]=\max(\{j\in[1,i)\mid R[j]\geq R[i]\}\cup\{0\})$ and $\textnormal{{nxt}}[i]=\min(\{j\in(i,n]\mid R[j]\geq R[i]\}\cup\{n+1\})$ , i.e., the positions of the previous and next non-smaller value of $R[i]$ in $R$ . For a shortest twin-palindrome $\alpha[p\operatorname{{.}\,{.}}p+3k]$ with $\ell=p+k$ and $r=p+2k$ , we have shown that $R[\ell]\geq k$ , $R[r]\geq k$ , and $\forall m\in(\ell,r):R[m]<k$ . Hence either $\textnormal{{nxt}}[\ell]=r$ or $\textnormal{{prv}}[r]=\ell$ (or both). Thus, instead of processing all the $\Theta(n^{2})$ possible pairs of positions, we instead process only the ${\mathcal{O}}(n)$ pairs $\{\ (\ell,\textnormal{{nxt}}[\ell])\mid\ell\in[1,n]\textnormal{\ and\ }% \textnormal{{nxt}}[\ell]\leq n\ \}\cup\{\ (\textnormal{{prv}}[r],r)\mid r\in[1% ,n]\textnormal{\ and\ }\textnormal{{prv}}[r]>0\ \}$ . If no twin-palindrome is found, we report that $\alpha$ contains no twin-palindrome. Otherwise, among the discovered twin-palindromes, we choose and report one of minimal length. Given $R$ , nxt, and prv, we can generate and process the pairs in ${\mathcal{O}}(n)$ time. Computing $R$ takes ${\mathcal{O}}(n)$ time with Manacher’s algorithm [19]. Computing nxt and prv takes ${\mathcal{O}}(n)$ time with a simple folklore algorithm (see, e.g., [5, Lemma 1]). $\hfill\blacktriangleleft$

Given two strings that contain no twin-palindromes (possibly substrings of a longer string, in the representation described in Section 2), we can use the Lemma above to find a twin-palindrome in their concatenation. Most importantly, this yields an output-sensitive algorithm: if the algorithm reports a twin-palindrome, its runtime is linear in the length of that twin-palindrome; otherwise, the runtime is linear in the combined length of the input strings. This property will later allow us to amortize the time over the number of symbols removed from the input when computing the minimal generator.

Lemma 11.

Let $\alpha[1\operatorname{{.}\,{.}}n]$ , $\beta[1\operatorname{{.}\,{.}}m]$ be strings that contain no twin-palindromes. There is an algorithm that computes $n^{\prime}\in[1,n]$ and $m^{\prime}\in[1,m]$ such that $\alpha(n-n^{\prime}\operatorname{{.}\,{.}}n]\beta[1\operatorname{{.}\,{.}}m^{% \prime}]$ is a twin-palindrome in ${\mathcal{O}}(n^{\prime}+m^{\prime})$ time, or outputs that no such values exist in ${\mathcal{O}}(n+m)$ time.

Proof.

Similarly to Lemma 7, we proceed in up to $\left\lceil\log_{2}(n+m)\right\rceil$ phases. In phase $j$ (starting with $j=1$ ), we aim to find a twin-palindrome $\alpha(n-n^{\prime}\operatorname{{.}\,{.}}n]\beta[1\operatorname{{.}\,{.}}m^{% \prime}]$ with $n^{\prime}+m^{\prime}\in(2^{j-1},2^{j}]$ . As soon as we find such $n^{\prime},m^{\prime}$ , we return them without executing further phases. We will perform phase $j$ in ${\mathcal{O}}(2^{j})$ time. Hence, if we find $n^{\prime},m^{\prime}$ in phase $j$ , the total time sums to ${\mathcal{O}}(2^{j})$ , which is ${\mathcal{O}}(n^{\prime}+m^{\prime})$ due to $n^{\prime}+m^{\prime}\in(2^{j-1},2^{j}]$ . If all phases finish without finding $n^{\prime},m^{\prime}$ , then the time sums to ${\mathcal{O}}(2^{\left\lceil\log_{2}(n+m)\right\rceil})={\mathcal{O}}(n+m)$ .

It remains to show how to perform phase $j$ in ${\mathcal{O}}(2^{j})$ time. Since we aim to find $n^{\prime},m^{\prime}\leq 2^{j}$ , it suffices to consider $\alpha^{\prime}=\alpha[\max(1,n-2^{j}+1),n]$ and $\beta^{\prime}=\beta[1\operatorname{{.}\,{.}}\min(m,2^{j})]$ of total length ${\mathcal{O}}(2^{j})$ . We apply Lemma 10 to $\gamma=\alpha^{\prime}\beta^{\prime}$ in ${\mathcal{O}}(2^{j})$ time. Since $\alpha^{\prime}=\gamma[1\operatorname{{.}\,{.}}\min(n,2^{j})]$ and $\beta^{\prime}=\gamma(\min(n,2^{j})\operatorname{{.}\,{.}}\left\lvert\gamma% \right\rvert]$ are free of twin-palindromes, every twin-palindrome $\gamma(a\operatorname{{.}\,{.}}b]$ satisfies $a<\min(n,2^{j})<b$ . Hence, if Lemma 10 reports $\gamma(a\operatorname{{.}\,{.}}b]$ as a twin-palindrome, we return $n^{\prime}=\min(n,2^{j})-a$ and $m^{\prime}=b-\min(n,2^{j})$ . Note that $b-a>2^{j-1}$ , since otherwise the twin-palindrome would have already been discovered during the previous phase. If Lemma 10 reports no twin-palindrome, then $\alpha^{\prime}\beta^{\prime}$ contains no twin-palindrome. $\hfill\blacktriangleleft$

We are now ready to show Theorem 8. The algorithm first finds a twin-palindrome of length $\ell\leq k$ crossing the boundary between $\alpha$ and $\beta$ . By Lemma 11, this step takes either ${\mathcal{O}}(\ell)$ time if the twin-palindrome exists, or ${\mathcal{O}}(k)$ time otherwise. If over half of the twin-palindrome is in $\alpha$ (resp., $\beta$ ), the algorithm cuts its left (resp., right) part out of $\alpha$ and $\beta$ shortening them by $\Omega(\ell)$ symbols, and applies the algorithm recursively to the resulting strings $\alpha$ and $\beta$ . As a result, the total time is linear in the sum of $k$ and on the number of the symbols that were cut off.

See 8

Proof.

If at least one of the strings is empty, then we trivially return $n^{\prime}=m^{\prime}=0$ . If neither $\alpha$ nor $\beta$ contain unit-squares, then we can without loss of generality assume that $\alpha\beta$ contains no unit-squares. This is because the only possible unit-square in $\alpha\beta$ is $\alpha[n]\beta[1]$ , and if $\alpha[n]=\beta[1]$ we can simply run the algorithm for $\alpha$ and $\beta[2\operatorname{{.}\,{.}}m]$ instead.

If both strings are non-empty, we apply Lemma 11 to ${\alpha_{0}=\alpha(\max(0,n-k)\operatorname{{.}\,{.}}n]}$ and ${\beta_{0}=\beta[1\operatorname{{.}\,{.}}\min(m,k)]}$ . This either results in $n_{0}\in[1,\min(n,k)]$ and $m_{0}\in[1,\min(m,k)]$ such that $\alpha(n-n_{0}\operatorname{{.}\,{.}}n]\beta[1\operatorname{{.}\,{.}}m_{0}]$ is a twin-palindrome in ${\mathcal{O}}(n_{0}+m_{0})$ time, or determines that no such values exist in ${\mathcal{O}}(k)$ time. Since $\alpha$ and $\beta$ do not contain any twin-palindromes of length at most $k$ , it is clear that $\alpha\beta$ contains no twin-palindrome of length at most $k$ if and only if no values $n_{0},m_{0}$ are found. Hence, if no values are found, we output $n^{\prime}=m^{\prime}=0$ .

Otherwise, there are symbols $a, b$ and a string $x$ such that $\alpha(n-n_{0}\operatorname{{.}\,{.}}n]\beta[1\operatorname{{.}\,{.}}m_{0}]=% axb\tilde{x}axb$ . Let $h=\left\lvert axb\right\rvert\leq(n_{0}+m_{0})/2$ . Assume that $n_{0}\geq m_{0}$ , then it holds $h\leq n_{0}$ and ${\alpha(n-n_{0}\operatorname{{.}\,{.}}n-n_{0}+h]}=axb$ . By Lemma 4(ii), $\alpha_{1}\beta_{1}$ with $\alpha_{1}={\alpha[1\operatorname{{.}\,{.}}n-n_{0}+h]}$ and $\beta_{1}=\beta(m_{0}\operatorname{{.}\,{.}}m]$ generates $\alpha\beta$ from left to right. Similarly, if $n_{0}<m_{0}$ , then $h<m_{0}$ and $\beta(m_{0}-h\operatorname{{.}\,{.}}m_{0}]=axb$ . In this case, again by Lemma 4(ii), $\alpha_{1}\beta_{1}$ with $\alpha_{1}=\alpha[1\operatorname{{.}\,{.}}n-n_{0}]$ and $\beta_{1}=\beta(m_{0}-h\operatorname{{.}\,{.}}m]$ generates $\alpha\beta$ from left to right. Either way, $h\leq(n_{0}+m_{0})/2$ implies $\left\lvert\alpha_{1}\beta_{1}\right\rvert=n+m-n_{0}-m_{0}+h\leq n+m-(n_{0}+m_% {0})/2.$

We finish the computation by applying Theorem 8 recursively to $\alpha_{1}$ and $\beta_{1}$ (where we do not need to materialize $\alpha_{1}$ and $\beta_{1}$ , as they are respectively a prefix and a suffix of $\alpha$ and $\beta$ ). This results in values $n_{1}$ and $m_{1}$ such that $\alpha_{1}[1\operatorname{{.}\,{.}}\left\lvert\alpha_{1}\right\rvert-n_{1}]% \beta_{1}(m_{1}\operatorname{{.}\,{.}}\left\lvert\beta_{1}\right\rvert]$ generates $\alpha_{1}\beta_{1}$ from left to right and contains no twin-palindrome of length at most $k$ . We have already established that $\alpha_{1}\beta_{1}$ generates $\alpha\beta$ from left to right, and hence also $\alpha_{1}[1\operatorname{{.}\,{.}}\left\lvert\alpha_{1}\right\rvert-n_{1}]% \beta_{1}(m_{1}\operatorname{{.}\,{.}}\left\lvert\beta_{1}\right\rvert]$ generates $\alpha\beta$ from left to right due to Observation 2(i). Thus, we return $n^{\prime}=n-\left\lvert\alpha_{1}\right\rvert+n_{1}$ and $m^{\prime}=m-\left\lvert\beta_{1}\right\rvert+m_{1}$ . Note that we obtained $\alpha_{1}\beta_{1}$ by replacing $axb\tilde{x}axb$ with $a x b$ in $\alpha\beta$ . This clearly does not introduce any new substrings of length two, and thus, if $\alpha\beta$ contains no unit-square, then neither does $\alpha_{1}\beta_{1}$ , and (recursively) neither does $\alpha_{1}[1\operatorname{{.}\,{.}}\left\lvert\alpha_{1}\right\rvert-n_{1}]% \beta(m_{1}\operatorname{{.}\,{.}}\left\lvert\beta_{1}\right\rvert]=\alpha[1% \operatorname{{.}\,{.}}n-n^{\prime}]\beta(m^{\prime}\operatorname{{.}\,{.}}m]$ .

Recursively applying the Theorem to $\alpha_{1}$ and $\beta_{1}$ takes ${\mathcal{O}}(k+n_{1}+m_{1})$ time. (This is true by induction over the sum of output values, as we have already shown the correctness for the base case where $n^{\prime}=m^{\prime}=0$ .) Apart from that, we spend ${\mathcal{O}}(n_{0}+m_{0})$ time, dominated by applying Lemma 11. The total time is ${\mathcal{O}}(k+n_{0}+n_{1}+m_{0}+m_{1})$ , which is ${\mathcal{O}}(k+n-\left\lvert\alpha_{1}\right\rvert+n_{1}+m-\left\lvert\beta_{% 1}\right\rvert+m_{1})={\mathcal{O}}(k+n^{\prime}+m^{\prime})$ due to $\left\lvert\alpha_{1}\beta_{1}\right\rvert\leq n+m-(n_{0}+m_{0})/2$ . $\hfill\blacktriangleleft$

4 Eliminating Twin-Palindromes in ${\mathcal{O}}(n\log^{*}n)$ Time

In this section, we improve the time to ${\mathcal{O}}(n\log^{*}n)$ . In principle, the solution is similar to the ${\mathcal{O}}(n\log n)$ time algorithm from Corollary 9, which cuts $T$ into halves, eliminates the twin-palindromes in each half recursively, and finally eliminates all the twin-palindromes across the boundary. We accelerate the algorithm by instead cutting $T$ into blocks of size roughly $(\log_{2}n)^{2}$ , which reduces the levels of recursion from ${\mathcal{O}}(\log n)$ to ${\mathcal{O}}(\log^{*}n)$ . However, we need a much more sophisticated algorithm for eliminating the twin-palindromes that cross any of the block boundaries. The algorithm is built on the following structural lemma.

Lemma 12.

Let $\alpha[1\operatorname{{.}\,{.}}n]$ be a string that contains neither twin-palindromes nor unit-squares.

(i)

There are at most ${\mathcal{O}}(\log n)$ palindromic suffixes of $\alpha$ .
(ii)

For any symbol $b$ , there is at most one twin-palindromic suffix of $\alpha\cdot b$ , and (if it exists) it is of length $3k+1$ , where $2k+1$ is the length of the longest palindromic suffix of $\alpha\cdot b$ .

Proof.

For both properties, we only have to consider palindromes of odd length because $\alpha$ contains no unit-square, and every palindrome of even length contains a unit-square at its center. For (i), assume that $\alpha$ has two non-trivial palindromic suffixes of respective lengths $2k_{1}+1$ and $2k_{2}+1$ such that $k_{2}\in(k_{1},2k_{1}]$ . Let $\ell=k_{2}-k_{1}\leq k_{1}$ . Then $\alpha[n-2k_{1}\operatorname{{.}\,{.}}n]$ is a palindrome centered at position $n-k_{1}$ , and its central portion of length $2\ell+1$ is $\alpha[n-k_{1}-\ell\operatorname{{.}\,{.}}n-k_{1}+\ell]=\alpha[n-k_{2}% \operatorname{{.}\,{.}}n-k_{2}+2\ell]$ . Similarly, $\alpha[n-2k_{2}\operatorname{{.}\,{.}}n]$ is a palindrome centered at $n-k_{2}$ , and its central portion of length $2\ell+1$ is $\alpha[n-k_{2}-\ell\operatorname{{.}\,{.}}n-k_{2}+\ell]$ . However, by Observation 5, if $\alpha[n-k_{2}\operatorname{{.}\,{.}}n-k_{2}+2\ell]$ and $\alpha[n-k_{2}-\ell\operatorname{{.}\,{.}}n-k_{2}+\ell]$ are palindromes, then $\alpha[n-k_{2}-\ell\operatorname{{.}\,{.}}n-k_{2}+2\ell]$ is a twin-palindrome, which is a contradiction. (See Figure 1(a).) Therefore, if we consider all the non-trivial palindromic suffixes in increasing order of length, then each suffix must be over twice as long as the previous one, and hence their number is ${\mathcal{O}}(\log n)$ .

(a) The string

\alpha[1\operatorname{{.}\,{.}}n]

has palindromic suffixes

\alpha[n-2k_{1}\operatorname{{.}\,{.}}n]

and

\alpha[n-2k_{2}\operatorname{{.}\,{.}}n]

with

{k_{2}\in(k_{1},2k_{1}]}

. This implies a twin-palindrome

\alpha[n-k_{2}-\ell\operatorname{{.}\,{.}}n-k_{2}+2\ell]

, where

\ell=k_{2}-k_{1}

.

(b) The string

\beta[1\operatorname{{.}\,{.}}m]=\alpha\cdot b

has palindromic suffix

\beta[m-2k_{2}\operatorname{{.}\,{.}}m]

and twin-palindromic suffix

{\beta[m-3k_{1}\operatorname{{.}\,{.}}m]}

with

3k_{1}+1<2k_{2}

. By mirroring the twin-palindrome over the center of the palindrome (indicated by the dotted line), we obtain another twin-palindrome

{\beta[m-2k_{2}\operatorname{{.}\,{.}}m-2k_{2}+3k_{1}]}={\textnormal{{rev}}(% \beta[m-3k_{1}\operatorname{{.}\,{.}}m])}

, which is a substring of

\alpha

.

Figure 1: Supplementary drawings for the proof of Lemma 12.

For (ii), let $m=n+1$ and $\beta[1\operatorname{{.}\,{.}}m]=\alpha\cdot b$ . Let $k_{2}$ be chosen such that $\beta[m-2k_{2}\operatorname{{.}\,{.}}m]$ is the longest palindromic suffix of $\beta$ . If $\beta$ has some twin-palindromic suffix $\beta[m-3k_{1}\operatorname{{.}\,{.}}m]$ , then it also has palindromic suffix $\beta[m-2k_{1}\operatorname{{.}\,{.}}m]$ , which implies $k_{1}\leq k_{2}$ . If $k_{1}=k_{2}$ , then $\beta[m-3k_{1}\operatorname{{.}\,{.}}m]$ is the claimed unique twin-palindromic suffix. It remains to consider the case $k_{1}>k_{2}$ . In a moment, we will show that $k_{1}>k_{2}$ implies $3k_{1}+1<2k_{2}$ , i.e., the twin-palindromic suffix $\beta[m-3k_{1}\operatorname{{.}\,{.}}m]$ is of length less than $2k_{2}$ . However, due to the palindromic suffix $\beta[m-2k_{2}\operatorname{{.}\,{.}}m]$ , it holds $\beta[m-2k_{2}\operatorname{{.}\,{.}}m-2k_{2}+3k_{1}]=\textnormal{{rev}}(\beta% [m-3k_{1}\operatorname{{.}\,{.}}m])$ . The reversal of a twin-palindrome is also a twin-palindrome, and thus $\beta[m-2k_{2}\operatorname{{.}\,{.}}m-2k_{2}+3k_{1}]$ is a twin-palindromic substring of $\alpha=\beta[1\operatorname{{.}\,{.}}m)$ (particularly due to $m-2k_{2}+3k_{1}<m$ ; see Figure 1(b)). This contradicts the fact that $\alpha$ contains no twin-palindromes.

Finally, we show that $k_{1}>k_{2}$ indeed implies $3k_{1}+1<2k_{2}$ . If $k_{1}>1$ , then we observe that $\alpha$ has non-trivial suffix palindromes of respective lengths $2k_{1}-1=2(k_{1}-1)+1$ and $2k_{2}-1=2(k_{2}-1)+1$ (by truncating the palindromes $\beta[m-2k_{1}\operatorname{{.}\,{.}}m]$ and $\beta[m-2k_{2}\operatorname{{.}\,{.}}m]$ by one symbol on either side). As seen in the proof of (i), it holds $k_{2}-1\notin(k_{1}-1,2k_{1}-2]$ . This implies $k_{2}\geq 2k_{1}$ , which in turn implies $3k_{1}+1\leq 3k_{2}/2+1<2k_{2}$ . (The latter inequality is due to $1<k_{1}<k_{2}$ and thus $k_{2}\geq 3$ .) If $k_{1}=1$ , then the twin-palindromic suffix at hand is $\beta[m-3\operatorname{{.}\,{.}}m]=\alpha[m-3\operatorname{{.}\,{.}}m)\cdot b=abab$ with $a=\alpha[m-1]$ . If $k_{2}=2$ then $\beta[m-4\operatorname{{.}\,{.}}m]=babab$ . However, then $\alpha$ has twin-palindromic suffix $\alpha[m-3\operatorname{{.}\,{.}}m)=baba$ , which is a contradiction. Hence $k_{2}>2$ , which implies $3k_{1}+1=4<2k_{2}$ . $\hfill\blacktriangleleft$

Theorem 13.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string that contains neither unit-squares nor twin-palindromes of length less than $(\log_{2}n)^{2}$ . There is an algorithm that, in ${\mathcal{O}}(n)$ time, computes a string $S$ that generates $T$ from left to right and contains neither twin-palindromes nor unit-squares.

Proof.

Let $s=\left\lceil(\log_{2}n)^{2}\right\rceil$ . We use a modified version of Manacher’s algorithm for maximal palindromic substrings [19]. In each step, the algorithm maintains the following data structures:

(1)
Strings $\alpha[1\operatorname{{.}\,{.}}a]$ and $\beta[1\operatorname{{.}\,{.}}b]$ are the current working strings such that
1. (i)
  
  $\alpha\beta$ generates $T$ from left to right, and
2. (ii)
  
  $\alpha\beta$ contains neither unit-squares nor twin-palindromes of length less than $s$ , and
3. (iii)
  
  $\alpha$ contains no twin-palindromes.
(2)
List $C=[c_{1},c_{2},\dots,c_{q}]$ with $c_{1}<c_{2}<\dots<c_{q}$ stores a subset of $[1,a]$ such that
1. (i)
  
  if $c\in[1,a]$ is the center of a palindromic suffix of $\alpha$ , then $c$ is in $C$ (but not every element of $C$ is necessarily the center of a palindromic suffix of $\alpha$ ), and
2. (ii)
  
  $c_{1}$ is the center of the longest palindromic suffix of $\alpha$ (hence $C$ is non-empty).
(3)

A prefix of array $R[1\operatorname{{.}\,{.}}n]$ contains, for each $i\in[1,a]\setminus\{c_{1},c_{2},\dots,c_{q}\}$ , the maximal radius of a palindrome centered at position $i$ with respect to $\alpha$ , i.e.,

$R[i]=\max\{h\in[0,\min(i-1,a-i)]\mid\alpha[i-h\operatorname{{.}\,{.}}i+h]% \textnormal{\ is a palindrome}\}.$
(4)

A prefix of array $E[1\operatorname{{.}\,{.}}n]$ contains, for each $j\in[1,a]$ , a list of centers defined as follows. The list $E[j]$ contains exactly all the $i\in[1,a]\setminus\{c_{1},c_{2},\dots,c_{q}\}$ for which $i+R[i]=j$ . (Informally, we store each center at the end position of its maximal palindrome.)

We initialize the algorithm with $\alpha=T[1]$ , $\beta=T[2\operatorname{{.}\,{.}}n]$ , and $C=[c_{1}]$ with $c_{1}=1$ . Without loss of generality, we assume that $T[1]$ has no occurrence in $T[2\operatorname{{.}\,{.}}n]$ (which can be achieved by prepending a unique symbol in front of $T$ ). This will ensure that $\alpha$ never becomes empty. Now we alternate between appending a symbol and eliminating twin-palindromes. Appending a symbol means that we remove the first symbol of $\beta$ and append it as the last symbol of $\alpha$ , i.e., we replace $\alpha$ with $\alpha\cdot\beta[1]$ , and $\beta$ with $\beta[2\operatorname{{.}\,{.}}b]$ . Since their concatenation $\alpha\beta$ does not change, Properties (1i) and (1ii) remain satisfied. As part of the appending routine, we update $C$ , $R$ , and $E$ so that they satisfy Properties (2), (3), and (4), which we describe in detail below. Afterwards, only Property (1iii) is possibly not satisfied, i.e., it might be that the new $\alpha$ contains a twin-palindrome. In this case, the eliminating subroutine truncates possibly both $\alpha$ and $\beta$ and updates $C$ , $R$ , and $E$ so that all properties are satisfied. If $\beta$ is empty after running the eliminating subroutine, then $\alpha$ generates $T$ from left to right and contains neither twin-palindromes nor unit-squares.

Appending a Symbol.

Now we describe how to maintain $C$ , $R$ , and $E$ after appending a symbol. By appending a symbol, we transitioned from working strings $\alpha[1\operatorname{{.}\,{.}}a]$ and $\beta[1\operatorname{{.}\,{.}}b]$ to $\alpha\cdot\beta[1]$ and $\beta[2\operatorname{{.}\,{.}}b]$ . The list $C$ currently contains values $c_{1},c_{2},\dots,c_{q}$ , and we append the newly added position $c_{q+1}=a+1$ to its end. If $\alpha\cdot\beta[1]$ has a palindromic suffix with center $c<a+1$ , then also $\alpha$ has a palindromic suffix with center $c$ . Hence the center of the longest palindromic suffix of $\alpha\cdot\beta[1]$ is already contained in $C$ . We consider the positions $c_{i}$ with $i\in[1,q]$ in increasing order. We first check if $c_{i}$ is the center of a palindromic suffix of $\alpha$ , which is always the case for $c_{1}$ . For $c_{i}$ with $i\neq 1$ , we exploit that $\alpha[2c_{1}-a\operatorname{{.}\,{.}}a]$ is a palindrome, which implies $\alpha[2c_{i}-a\operatorname{{.}\,{.}}a]=\textnormal{{rev}}(\alpha[2c_{1}-a% \operatorname{{.}\,{.}}2c_{1}-2c_{i}+a])$ . Hence $\alpha[2c_{i}-a\operatorname{{.}\,{.}}a]$ is a palindrome centered at position $c_{i}$ if and only if $\alpha[2c_{1}-a\operatorname{{.}\,{.}}2c_{1}-2c_{i}+a]$ is a palindrome centered at position $c_{i}^{\prime}=2c_{1}-c_{i}<c_{1}$ , which is the case if and only if $R[c_{i}^{\prime}]\geq a-c_{i}$ . We have already computed $R[c_{i}^{\prime}]$ because $c_{i}^{\prime}$ cannot be in $C$ due to $c_{i}^{\prime}<c_{1}$ . Therefore, we can check if $\alpha[2c_{i}-a\operatorname{{.}\,{.}}a]$ is a palindromic suffix in constant time. If $\alpha[2c_{i}-a\operatorname{{.}\,{.}}a]$ is a palindromic suffix of $\alpha$ , we check if $\alpha[2c_{i}-a-1\operatorname{{.}\,{.}}a]\beta[1]$ is a palindromic suffix of $\alpha\cdot\beta[1]$ , which is the case if and only if $\alpha[2c_{i}-a-1]=\beta[1]$ . If $\alpha[2c_{i}-a-1]\neq\beta[1]$ , then we proceed with the next larger value of $i$ . Otherwise, we know that $\alpha[2c_{i}-a-1\operatorname{{.}\,{.}}a]\beta[1]$ is the longest palindromic suffix of $\alpha\cdot\beta[1]$ , and its center is $c_{i}$ . If no $i\in[1,q]$ leads to a palindromic suffix of $\alpha\cdot\beta[1]$ , then the longest palindromic suffix is $\beta[1]$ with center $c_{q+1}=a+1$ .

From now on, let $i\in[1,q+1]$ be the value such that $c_{i}$ is the center of the longest palindromic suffix of $\alpha\cdot\beta[1]$ . Now we update $C$ , $R$ , and $E$ . We remove $c_{1},c_{2},\dots,c_{i-1}$ from the list. If we actually removed positions from the list (i.e., $i>1$ ), we have to update $R$ and $E$ accordingly. In this case, it is clear that $\alpha[2c_{1}-a\operatorname{{.}\,{.}}a]$ is the maximal palindrome with center $c_{1}$ with respect to the string $\alpha\beta$ . We assign $R[c_{1}]=a-c_{1}$ and append $c_{1}$ to $E[a]$ . For the remaining $c_{j}$ with $j\in[2,i)$ , we use the same observation as before: it holds $\alpha[2c_{i}-a\operatorname{{.}\,{.}}a]=\textnormal{{rev}}(\alpha[2c_{1}-a% \operatorname{{.}\,{.}}2c_{1}-2c_{i}+a])$ , and we know that this string is not a palindrome. Hence the radii of the maximal palindromes centered at positions $c_{i}^{\prime}$ and $c_{i}$ are identical, and we assign $R[c_{i}]=R[c_{i^{\prime}}]$ and append $c_{i}$ to $E[c_{i}+R[c_{i^{\prime}}]]$ .

We spent ${\mathcal{O}}(i)$ time for computing $i$ , removing $i-1$ elements from $C$ , and updating the corresponding entries in $R$ and $E$ . We also added $a+1$ to $C$ , and thus we can amortize the ${\mathcal{O}}(i)$ time over the $i$ elements that were either added to or removed from $C$ . Each removed element was previously added, and thus the overall time for the appending subroutine is linear in the number of times we add an element to $C$ .

Eliminating Twin-Palindromes.

It remains to show how to eliminate twin-palindromes after appending a symbol. For this part, let $\alpha[1\operatorname{{.}\,{.}}a]$ and $\beta[1\operatorname{{.}\,{.}}b]$ be the working strings after appending a symbol, i.e., we just transitioned from strings $\alpha[1\operatorname{{.}\,{.}}a)$ and $\alpha[a]\beta$ to $\alpha$ and $\beta$ . We have already ensured that $C$ , $R$ , and $E$ satisfy the claimed properties. We know that $\alpha[1\operatorname{{.}\,{.}}a)$ does not contain any twin-palindromes, and the only property we have to establish is that $\alpha$ contains no twin-palindromes, i.e., we are only interested in twin-palindromic suffixes of $\alpha$ . By Lemma 12(ii), the only potential twin-palindromic suffix of $\alpha$ is $\alpha[a-3k\operatorname{{.}\,{.}}a]$ with $k=a-c_{1}\geq 1$ . Since $\alpha[a-2k\operatorname{{.}\,{.}}a]$ is a palindrome, $\alpha[a-3k\operatorname{{.}\,{.}}a]$ is a twin-palindrome if and only if $\alpha[a-3k\operatorname{{.}\,{.}}a-k]$ is a palindrome (by Observation 5), which is the case if and only if $R[a-2k]\geq k$ . We have already computed $R[a-2k]$ because $a-2k=2c_{1}-a<c_{1}$ is not contained in $C$ . If $R[a-2k]<k$ (or $a\leq 2k$ ), then we terminate the eliminating subroutine in ${\mathcal{O}}(1)$ time.

Otherwise, i.e., if $R[a-2k]\geq k$ , we know that $\alpha[a-3k\operatorname{{.}\,{.}}a]$ is the only twin-palindrome in $\alpha$ . There are symbols $c, d$ and a string $x$ with $\alpha[a-3k\operatorname{{.}\,{.}}a]=cxd\tilde{x}cxd$ . We truncate $\alpha$ to $\alpha[1\operatorname{{.}\,{.}}a-2k]$ , i.e., we substitute $cxd\tilde{x}cxd$ with $c x d$ . Since $\alpha\beta$ does not contain unit-squares, and the substitution introduces no new substrings of length two, $\alpha[1\operatorname{{.}\,{.}}a-2k]\beta$ also does not contain unit-squares. Also, $\alpha[1\operatorname{{.}\,{.}}a-2k]$ generates $\alpha$ from left to right, and thus $\alpha[1\operatorname{{.}\,{.}}a-2k]\beta$ generates $\alpha\beta$ and consequently $T$ from left to right. Finally, $\alpha\beta$ does not contain twin-palindromes of length less than $s$ , and therefore neither does $\alpha[1\operatorname{{.}\,{.}}a-2k]$ or $\beta$ . However, their concatenation $\alpha[1\operatorname{{.}\,{.}}a-2k]\beta$ may contain twin-palindromes of length less than $s$ . We apply Theorem 8 with parameter $s$ to $\alpha[1\operatorname{{.}\,{.}}a-2k]$ and $\beta$ , resulting in values ${a^{\prime}}$ and ${b^{\prime}}$ such that $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]\beta({b^{\prime}}% \operatorname{{.}\,{.}}b]$ generates $\alpha[1\operatorname{{.}\,{.}}a-2k]\beta$ (and hence also $\alpha\beta$ , and consequently $T$ ) from left to right, and contains neither unit-squares nor twin-palindromes of length at most $s$ . This takes ${\mathcal{O}}(s+{a^{\prime}}+{b^{\prime}})$ time.

We replace $\alpha$ with $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ and $\beta$ with $\beta({b^{\prime}}\operatorname{{.}\,{.}}b]$ . It remains to update $C$ , $R$ , and $E$ . Note that $a-2k=2c_{1}-a<c_{1}$ , and thus none of the positions $c_{1},\dots,c_{q}$ currently stored in $C$ lie within $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ . Hence we remove all elements from $C$ . Currently, each $R[i]$ with $i\in[1,a-2k-{a^{\prime}}]$ contains the radius of the longest palindrome centered at position $i$ with respect to $\alpha$ . If $i+R[i]<a-2k-{a^{\prime}}$ , then $R[i]$ is also the radius of the longest palindrome centered at position $i$ with respect to $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ , and thus we do not need to update such $R[i]$ . If, however, $i+R[i]\geq a-2k-{a^{\prime}}$ , then $i$ is the center of a suffix palindrome of $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ , and we need to add $i$ to the list $C$ . In this case, we do not care about the content of $R[i]$ , and hence we can leave the entire array $R$ unchanged.

We cannot afford to scan $R$ and check if $i+R[i]\geq a-2k-{a^{\prime}}$ for each position. Instead, we directly identify the positions that need to be added to $C$ by using $E$ . We collect all the elements stored in all the lists $E[j]$ with $j\in[a-2k-{a^{\prime}},a]$ . Let $i$ be one of these elements. If $i>a-2k-{a^{\prime}}$ , then we simply discard it. The remaining elements are exactly the positions $i$ for which $i$ is the center of a suffix palindrome of $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ . We sort them in increasing order, and place them into $C$ . Finally, we replace $E[a-2k-{a^{\prime}}]$ with an empty list.

Analyzing the Time Complexity.

If the eliminating subroutine does not finish in constant time, then we permanently reduce the length of the working strings by $2k+a^{\prime}+b^{\prime}$ , particularly by eliminating a twin-palindrome of length $3k+1$ from $\alpha\beta$ . Since $\alpha\beta$ does not contain a twin-palindrome of length less than $s$ , it holds $k=\Omega(s)$ . Hence we reduce the length by $\Omega(s+a^{\prime}+b^{\prime})$ , and we can afford the ${\mathcal{O}}(s+a^{\prime}+b^{\prime})$ time for applying Theorem 8. We also discard at most ${\mathcal{O}}(k+a^{\prime})$ elements that were collected from $E[a-2k-{a^{\prime}}\operatorname{{.}\,{.}}a]$ , which we can afford to do in ${\mathcal{O}}(k+a^{\prime})$ time because we reduced the length by $\Omega(k+a^{\prime})$ . The remaining elements correspond to suffix palindromes of $\alpha[1\operatorname{{.}\,{.}}a-2k-{a^{\prime}}]$ , and by Lemma 12(i) there are at most ${\mathcal{O}}(\log n)={\mathcal{O}}(\sqrt{s})$ of them. Hence we can afford to sort them and to add them to the list $C$ . During the entire algorithm execution, we add ${\mathcal{O}}(\frac{n}{\sqrt{s}})$ elements to $C$ in this way.

Over all iterations of the appending subroutine, we append additional ${\mathcal{O}}(n)$ elements to $C$ . The total time for the appending subroutine is linear in the number of elements added to $C$ , and hence it is ${\mathcal{O}}(n)$ , which concludes the proof. $\hfill\blacktriangleleft$

4.1 Obtaining the Recursive Algorithm

In this section, we show our ${\mathcal{O}}(n\log^{*}n)$ -time algorithm. As mentioned earlier, it is recursive: it partitions the input into blocks, eliminates twin-palindromes and unit-squares in them recursively, and then concatenates the blocks together to form the output. To concatenate the blocks together, we use the following procedure:

Lemma 14.

Let $\alpha_{1},\dots,\alpha_{k}$ be strings, where each string is of length $\leq m$ and contains no twin-palindromes. There is an algorithm that, in ${\mathcal{O}}(km)$ time, computes a string $\beta$ such that

(i)

$\beta$ generates $\alpha_{1}\alpha_{2}\dots\alpha_{k}$ from left to right, and
(ii)

$\beta$ contains no twin-palindrome of length at most $m$ , and
(iii)

if none of $\alpha_{1},\alpha_{2},\dots,\alpha_{k}$ contains unit-squares, then neither does $\beta$ .

Proof.

We proceed in $k$ steps numbered $1,2,\dots k$ . After running step $i$ , the algorithm has computed a string $\beta_{i}$ that generates $\alpha_{1}\alpha_{2}\dots\alpha_{i}$ from left to right. If none of $\alpha_{1},\alpha_{2},\dots,\alpha_{i}$ contains unit-squares, then neither does $\beta_{i}$ . Furthermore, $\beta_{i}$ contains no twin-palindromes of length at most $m$ , and it is stored in the first $\left\lvert\beta_{i}\right\rvert$ cells of an output array $A[1\operatorname{{.}\,{.}}km]$ .

In step $i=1$ , we write $\beta_{1}=\alpha_{1}$ to the output array, which trivially satisfies the claimed properties. In step $i$ with $i>1$ , we apply Theorem 8 with parameter $m$ to $\beta_{i-1}$ and $\alpha_{i}$ . This results in $n^{\prime}\in[0,\left\lvert\beta_{i-1}\right\rvert]$ and $m^{\prime}\in[0,\left\lvert\alpha_{i}\right\rvert]$ such that $\beta_{i-1}[1\operatorname{{.}\,{.}}\left\lvert\beta_{i-1}\right\rvert-n^{% \prime}]\alpha_{i}(m^{\prime}\operatorname{{.}\,{.}}\left\lvert\alpha_{i}% \right\rvert]$ generates $\beta_{i-1}\alpha_{i}$ (and thus also $\alpha_{1}\alpha_{2}\dots\alpha_{i}$ ) from left to right and does not contain a twin-palindrome of length at most $m$ . Hence we can use $\beta_{i}=\beta_{i-1}[1\operatorname{{.}\,{.}}\left\lvert\beta_{i-1}\right% \rvert-n^{\prime}]\alpha_{i}(m^{\prime}\operatorname{{.}\,{.}}\left\lvert% \alpha_{i}\right\rvert]$ , and we write $\alpha_{i}(m^{\prime}\operatorname{{.}\,{.}}\left\lvert\alpha_{i}\right\rvert]$ to $A(\left\lvert\beta_{i-1}\right\rvert-n^{\prime}\operatorname{{.}\,{.}}\left% \lvert\beta_{i-1}\right\rvert+\left\lvert\alpha_{i}\right\rvert-n^{\prime}-m^{% \prime}]$ . Now $A[1\operatorname{{.}\,{.}}\left\lvert\beta_{i-1}\right\rvert+\left\lvert\alpha% _{i}\right\rvert-n^{\prime}-m^{\prime}]$ contains $\beta_{i}$ . We have already established that, if none of $\alpha_{1},\dots,\alpha_{i}$ contains unit-squares, then neither does $\beta_{i-1}$ , and in this case Theorem 8 ensures that also $\beta_{i}$ contains no unit-squares.

The time for step $i$ is dominated by applying Theorem 8, which takes ${\mathcal{O}}(m+n^{\prime}+m^{\prime})={\mathcal{O}}(m+\left\lvert\beta_{i-1}% \right\rvert+\left\lvert\alpha_{i}\right\rvert-\left\lvert\beta_{i}\right\rvert)$ time. This sums to ${\mathcal{O}}(km)$ over all steps. $\hfill\blacktriangleleft$

We are now ready to present the main result of this section:

Corollary 15.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string. There is a recursive algorithm that, in ${\mathcal{O}}(n\log^{*}n)$ time, computes a string $S$ that generates $T$ from left to right and contains neither twin-palindromes nor unit-squares.

Proof.

Assume that $T$ contains no unit-squares. This assumption is without loss of generality, as otherwise we can eliminate unit-squares with Lemma 6, and then show the correctness of the Corollary for the resulting shorter string. The algorithm operates in a recursive manner. We cut $T$ into $b=\left\lceil n/m\right\rceil$ blocks of length $m=\left\lfloor(\log_{2}n)^{2}\right\rfloor$ (the final block might be shorter). For each block $B_{i}$ , where $i\in[1,b]$ , we obtain a string $B_{i}^{\prime}$ that generates $B_{i}$ from left to right and contains neither twin-palindromes nor unit-squares. This is done by applying the algorithm recursively to each block except for the final one. For the final block $B_{b}$ , we obtain $B_{b}^{\prime}$ by applying Corollary 9 in ${\mathcal{O}}(m\log m)\subseteq{\mathcal{O}}(n)$ time instead. For $x\in\mathbb{N}^{+}$ , let $t(x)$ be the time needed by Corollary 15 for a length- $x$ string. Then the time needed for computing the reduced blocks is ${\mathcal{O}}(n)+\left\lfloor n/\left\lfloor(\log_{2}n)^{2}\right\rfloor\right% \rfloor\cdot t\left(\left\lfloor(\log_{2}n)^{2}\right\rfloor\right)$ .

Next, we obtain a single string $T^{\prime}$ that generates $B_{1}^{\prime}B_{2}^{\prime}\dots B_{b}^{\prime}$ (and thus also $T$ ) from left to right and contains neither unit-squares, nor twin-palindromes of length at most $m$ . This takes ${\mathcal{O}}(b\cdot m)={\mathcal{O}}(n)$ time with Lemma 14. Finally, we apply Theorem 13 to $T^{\prime}$ , which takes ${\mathcal{O}}(n)$ time and results in a string $S$ that generates $T^{\prime}$ (and thus also $T$ ) from left to right. Furthermore, $S$ contains neither twin-palindromes nor unit-squares. The total time is $t(n)={\mathcal{O}}(n)+\left\lfloor n/\left\lfloor(\log_{2}n)^{2}\right\rfloor% \right\rfloor\cdot t\left(\left\lfloor(\log_{2}n)^{2}\right\rfloor\right)$ , which resolves to $t(n)={\mathcal{O}}(n\log^{*}n)$ . (Every two levels of recursion, the block size decreases from some length $m$ to $\left\lfloor(\log_{2}\left\lfloor(\log_{2}m)^{2}\right\rfloor)^{2}\right\rfloor)$ , which is less than $\log_{2}m$ if $m$ exceeds a sufficiently large constant. Hence $2\cdot\log^{*}n$ levels of recursion are sufficient to reach blocks of constant size.) $\hfill\blacktriangleleft$

5 Eliminating Twin-Palindromes in ${\mathcal{O}}(n)$ Time

Let $L$ be the function defined by $\forall x\in\mathbb{N}:L(x)=\left\lfloor(\log_{2}x)^{2}\right\rfloor$ . As seen in the proof of Corollary 15, there is a constant $c$ such that the algorithm runs in time

t(n)=c\cdot n+\left\lfloor n/L(n)\right\rfloor\cdot t\left(L(n)\right)\leq c% \cdot n+n/L(n)\cdot t\left(L(n)\right).

Particularly, we reduce the problem from a single instance of size $n$ to at most $n/L(n)$ instances of size $L(n)$ each. Let $s=L(L(L(L(x))))=o(\log\log\log n)$ . After four levels of recursion, we have to solve at most $n/s$ instances of size $s$ . It then holds $t(n)\leq 4cn+n/s\cdot t(s)$ . If we solve each instance in ${\mathcal{O}}(s)$ time, then the total time is linear. If the string was given over a polynomial integer alphabet, then we could reduce the alphabet of each instance (in batch using radix-sorting), and then use a precomputed lookup table to solve each instance in constant time. The lookup table would store the solution to each possible instance of size $s$ . (This approach is commonly referred to as the “Four Russians” technique.³³3The technique originates from the work of Arlazarov, Dinitz, Kronrod, and Faradzhev [4], who proposed an efficient method for computing the transitive closure of a directed graph by precomputing solutions for small blocks of the adjacency matrix. The term “Four Russians” technique has since come to broadly refer to the use of precomputed solutions for small subproblems, typically of logarithmic size relative to the input. The name reflects the authors’ affiliation in Moscow, though not all were Russian.) However, we cannot use strings over general unordered alphabet to access lookup tables. Instead, in Lemma 16 (below), we adapt the technique such that the precomputation no longer only considers all possible instances of size $s$ , but also all algorithms that potentially solve such instances efficiently. Among all the algorithms, we choose the fastest one that actually works correctly. This way, we show that we can indeed achieve ${\mathcal{O}}(s)$ time per instance after performing an ${\mathcal{O}}(2^{2^{2^{s}}})=o(n)$ time preprocessing. This improves the time complexity of Corollary 15 to ${\mathcal{O}}(n)$ . By applying Lemma 7 as a postprocessing, we obtain the main result from Theorem 1.

Lemma 16.

Let $n\in\mathbb{N}^{+}$ . After an ${\mathcal{O}}(2^{2^{2^{n}}})$ time preprocessing, we can answer the following type of query in ${\mathcal{O}}(n)$ time. Given a string $T[1\operatorname{{.}\,{.}}n]$ , output a string $S$ that contains neither twin-palindromes nor unit-squares and generates $T$ from left to right.

Proof.

For some unknown value $N\in\mathbb{N}^{+}$ , assume that there is an algorithm that solves a query by performing less than $N$ comparisons of symbols of the query string (running in arbitrary time). The algorithm can be implemented as a decision tree such that it answers queries in ${\mathcal{O}}(N)$ time. The decision tree is a binary tree with nodes on $N$ levels, i.e., with at most $2^{N}-1$ nodes. Each internal node has two children and is annotated with a position pair from $[1,n]\times[1,n]$ . Each leaf is annotated with a length $m\in[1,n]$ and a string $M\in[1,n]^{m}$ . A query $T$ is then answered as follows. We start at the root. As long as we are at an internal node, we perform the symbol comparison dictated by its annotation. If the outcome of the comparison is positive, we move to the left child, and otherwise to the right child. Once we reach a leaf labeled with $m$ and $M$ , we return the string $S[1\operatorname{{.}\,{.}}m]$ defined by $\forall i\in[1,m]:S[i]=T[M[i]]$ . This takes ${\mathcal{O}}(N)$ comparisons and time. By adding dummy comparisons, we can assume without loss of generality that the decision tree is complete.

It remains to be shown how to compute the decision tree. Rather than directly obtaining the correct tree, we instead generate all the possible trees admitted by the domains of the annotations. We also generate all the possible length- $n$ strings over alphabet $[1,n]$ . Note that every length- $n$ string over arbitrary alphabet is isomorphic to one of these strings. For each decision tree, and for each possible query string, we run the query procedure. We then check if the output string contains neither twin-palindromes nor unit-squares, and actually generates the query string from left to right. Since there is an algorithm that requires less than $N$ comparisons, at least one of the decision trees must work for all possible input strings. As soon as we find such a tree, we terminate and use this tree to answer all future queries.

Now we analyze the preprocessing time. We will use $z=y^{y^{y}}$ with $y={n+N}$ as an upper bound. Given all of its annotations, we can generate a tree in ${\mathcal{O}}(2^{N}\cdot n)\subset{\mathcal{O}}(z)$ time. There are less than $(n^{n+1})^{(2^{N})}\subset{\mathcal{O}}(z)$ ways of annotating a tree, and they can be easily enumerated. Hence the trees are generated in ${\mathcal{O}}(z^{2})$ time. Generating the $n^{n}\subset{\mathcal{O}}(z)$ query strings takes ${\mathcal{O}}(n^{n+1})\subset{\mathcal{O}}(z)$ time. Given a tree and a query string, answering the query and checking if the output string is free of twin-palindromes and unit-squares takes ${\mathcal{O}}(N+n^{3})\subset{\mathcal{O}}(z)$ time (by testing each substring naively). We check if the output string actually generates the query string from left to right by trying all ${\mathcal{O}}(3^{n})$ possible walks in ${\mathcal{O}}(3^{n}\cdot n)\subset{\mathcal{O}}(z)$ time. By multiplying the number ${\mathcal{O}}(z)$ of possible trees with the number ${\mathcal{O}}(z)$ of possible instances and the time ${\mathcal{O}}(z)$ to verify that a tree works for a given instance, the overall preprocessing time is ${\mathcal{O}}(z^{3})$ .

Since we do not know the value of $N$ in advance, we simply start with $N=1$ and keep increasing $N$ by one until the procedure succeeds, which increases the time to ${\mathcal{O}}(N\cdot z^{3})\subset{\mathcal{O}}(z^{4})$ . By Corollary 15, we know that $N\in{\mathcal{O}}(n\log^{*}n)$ suffices, which readily implies ${\mathcal{O}}(z^{4})\subset{\mathcal{O}}(2^{2^{2^{n}}})$ . (This would even hold for $N={\mathcal{O}}(\operatorname{\textnormal{poly}}(n))$ .) Finally, we terminate with the minimal value $N$ such that a query can be answered in fewer than $N$ symbol comparisons in the worst case, and hence the ${\mathcal{O}}(N)$ query time is optimal. In Appendix A, Lemma 17, we show that $N={\mathcal{O}}(n)$ comparisons are sufficient, which concludes the proof. $\hfill\blacktriangleleft$

References

[1] Amihood Amir and Itai Boneh. Dynamic palindrome detection. CoRR, abs/1906.09732, 2019. arXiv:1906.09732.
[2] Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition detection in a dynamic string. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pages 5:1–5:18, 2019. doi:10.4230/LIPICS.ESA.2019.5.
[3] Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common substring made fully dynamic. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pages 6:1–6:17, 2019. doi:10.4230/LIPIcs.ESA.2019.6.
[4] V. L. Arlazarov, E. A. Dinitz, M. A. Kronrod, and I. A. Faradzhev. On economical construction of the transitive closure of an oriented graph. Dokl. Akad. Nauk SSSR, 194(3):487–488, 1970. (Original in Russian). URL: https://www.mathnet.ru/eng/dan35675.
[5] Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. LRM-trees: Compressed indices, adaptive sorting, and compressed permutations. Theor. Comput. Sci., 459:26–41, 2012. doi:10.1016/J.TCS.2012.08.010.
[6] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM J. Comput., 48(2):481–512, 2019. doi:10.1137/17M112720X.
[7] Panagiotis Charalampopoulos. Data structures for strings in the internal and dynamic settings. PhD thesis, King’s College London (University of London), UK, 2020. doi:10.12681/eadd/57602.
[8] Jean-Pierre Duval, Thierry Lecroq, and Arnaud Lefebvre. Linear computation of unbordered conjugate on unordered alphabet. Theor. Comput. Sci., 522:77–84, 2014. doi:10.1016/J.TCS.2013.12.008.
[9] Jonas Ellert. Efficient string algorithmics across alphabet realms. PhD thesis, Technical University of Dortmund, Germany, 2024. doi:10.17877/DE290R-24255.
[10] Jonas Ellert and Johannes Fischer. Linear time runs over general ordered alphabets. In Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), pages 63:1–63:16, 2021. doi:10.4230/LIPICS.ICALP.2021.63.
[11] Jonas Ellert, Pawel Gawrychowski, and Garance Gourdel. Optimal square detection over general alphabets. CoRR, abs/2303.07229, 2023. doi:10.48550/arXiv.2303.07229.
[12] Jonas Ellert, Paweł Gawrychowski, and Garance Gourdel. Optimal square detection over general alphabets. In Proceedings of the 34th Annual Symposium on Discrete Algorithms (SODA 2023), pages 5220–5242, 2023. doi:10.1137/1.9781611977554.ch189.
[13] Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster queries for longest substring palindrome after block edit. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019), pages 27:1–27:13, 2019. doi:10.4230/LIPIcs.CPM.2019.27.
[14] Zvi Galil and Kunsoo Park. Truly alphabet-independent two-dimensional pattern matching. In Proceedings of the 33rd Annual Symposium on Foundations of Computer Science (FOCS 1992), pages 247–256, 1992. doi:10.1109/SFCS.1992.267767.
[15] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977. doi:10.1137/0206024.
[16] Dmitry Kosolobov. Finding the leftmost critical factorization on unordered alphabet. CoRR, abs/1509.01018, 2015. arXiv:1509.01018.
[17] Dmitry Kosolobov. Lempel-Ziv factorization may be harder than computing all runs. In Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), pages 582–593, 2015. doi:10.4230/LIPICS.STACS.2015.582.
[18] Michael G. Main and Richard J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms, 5(3):422–432, 1984. doi:10.1016/0196-6774(84)90021-X.
[19] Glenn K. Manacher. A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM, 22(3):346–351, 1975. doi:10.1145/321892.321896.
[20] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16–34, 2002. doi:10.1145/505241.505243.
[21] Ian Pratt-Hartmann. Walking on words. CoRR, abs/2208.08913, 2022. doi:10.48550/arXiv.2208.08913.
[22] Ian Pratt-Hartmann. Walking on words. In Proceedings of the 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024), pages 25:1–25:17, 2024. doi:10.4230/LIPICS.CPM.2024.25.
[23] Robert E. Tarjan. Problems in data structures and algorithms. In Graph Theory, Combinatorics and Algorithms: Interdisciplinary Applications, pages 17–39. Springer US, Boston, MA, 2005. doi:10.1007/0-387-25036-0_2.
[24] Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon substring after edit. In Proceedings of the 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018), pages 19:1–19:10, 2018. doi:10.4230/LIPIcs.CPM.2018.19.
[25] Hao Yuan and Mikhail J. Atallah. Data structures for range minimum queries in multidimensional arrays. In Proceedings of the 21st Annual Symposium on Discrete Algorithms (SODA 2010), pages 150–160, 2010. doi:10.1137/1.9781611973075.14.

Appendix A Eliminating Twin-Palindromes in ${\mathcal{O}}(n)$ Comparisons

Lemma 17.

Let $T[1\operatorname{{.}\,{.}}n]$ be a string. There is an algorithm that, using ${\mathcal{O}}(n)$ symbol comparisons, computes a string $S$ that generates $T$ from left to right and contains neither twin-palindromes nor unit-squares.

Limiting the Number of Oracle Queries with Positive Answer.

As explained in Section 2, all strings encountered by the algorithm have an explicit physical representation. If we encounter strings $S_{1}[1\operatorname{{.}\,{.}}m_{1}]$ and $S_{2}[1\operatorname{{.}\,{.}}m_{2}]$ , then they are physically stored as arrays $A_{1}[1\operatorname{{.}\,{.}}m_{1}]$ and $A_{2}[1\operatorname{{.}\,{.}}m_{2}]$ such that for $i\in[1,m_{1}],j\in[1,m_{2}]$ it holds $S_{1}[i]=T[A_{1}[i]]$ and $S_{2}[j]=T[A_{2}[j]]$ . Hence $S_{1}[i]=S_{2}[j]$ if and only if $T[A_{1}[i]]=T[A_{2}[j]]$ , i.e., we can perform each symbol comparison by asking exactly one oracle query to $T$ .

Recall that $T[1\operatorname{{.}\,{.}}n]$ itself has physical representation $B[1\operatorname{{.}\,{.}}n]$ with $\forall i\in[1,n]:B[i]=i$ . By manipulating $B$ , we can ensure that the algorithm asks at most $n-1$ oracle queries with positive answer. We will ensure that $\forall i\in[1,n]:T[i]=T[B[i]]$ at all times. Instead of directly issuing an oracle query, we test if $T[i]=T[j]$ with $i,j\in[1,n]$ as follows.

1.

If $B[i]=B[j]$ , we return that $T[i]=T[j]$ .
2.
Otherwise, we ask the oracle if $T[B[i]]=T[B[j]]$ .
1. (a)
  
  If $T[B[i]]\neq T[B[j]]$ , we return that $T[i]\neq T[j]$ .
2. (b)
  
  If $T[B[i]]=T[B[j]]$ , we scan $B$ and replace all occurrences of $B[j]$ with $B[i]$ . We then return that $T[i]=T[j]$ . (This naively implements a union-find data structure.)

In Case 2b, the number of distinct values in $B$ decreases. Hence we encounter the case at most $n-1$ times. This is the only case in which we ask an oracle query and obtain a positive answer, hence the total number of such queries will be at most $n-1$ . Throughout the remainder of the section, we will use the previously described strategy for all symbol comparisons. Hence we perform at most ${\mathcal{O}}(n)$ oracle queries with positive answer, and we only have to count the number of comparisons with negative answers.

Eliminating Twin-Palindromes.

Assume that $T$ contains no unit-squares. This assumption is without loss of generality, as otherwise we can eliminate unit-squares with Lemma 6, and then show the correctness of the Lemma for the resulting shorter string. We proceed in ${\mathcal{O}}(n)$ phases numbered $1,2,\dots,K$ . Each phase consists of ${\mathcal{O}}(n)$ iterations. Before each iteration of phase $k\in[1,K]$ , each of the following conditions is satisfied:

1.

We have computed a string $S_{k}[1\operatorname{{.}\,{.}}m]$ that generates $T$ from left to right and contains neither unit-squares nor twin-palindromes of length less than $3k+1$ .
2.

We have computed an array $R_{k}[1\operatorname{{.}\,{.}}m]$ that contains the maximal radius of the palindrome centered at each position of $S_{k}$ , truncated to length $k$ , i.e., for $i\in[1,m]$ it holds

$R_{k}[i]=\max\{h\in[0,\min(i-1,m-i,k)]\mid S_{k}[i-h\operatorname{{.}\,{.}}i+h% ]\textnormal{ is a palindrome}\}.$
3.

We have charged some entries of $R_{k}$ . However, $R_{k}[i]$ cannot be charged if $R_{k}[i]=k$ . We will use charged entries to account for symbol comparisons later.

As soon as we produce $S_{k}[1\operatorname{{.}\,{.}}m]$ with $3k+1>m$ , we know that $S_{k}$ contains no twin-palindromes. Now we describe how to implement an iteration of phase $k$ . The goal of phase $k$ is to eliminate twin-palindromes of length exactly $3k+1$ . By Observation 5, some substring $S_{k}[p\operatorname{{.}\,{.}}p+3k]$ is a twin-palindrome if and only if $S_{k}[p\operatorname{{.}\,{.}}p+2k]$ and $S_{k}[p+k\operatorname{{.}\,{.}}p+3k]$ are palindromes, which is the case if and only if $R_{k}[p+k]=k$ and $R_{k}[p+2k]=k$ . Hence, given $R_{k}$ , we can find a twin-palindrome of length $3k+1$ (or determine that none exists) without performing any symbol comparisons.

Performing a Final Iteration.

If we cannot find a twin-palindrome of length $3k+1$ , then we immediately finish the iteration and proceed to the next phase $k+1$ . A twin-palindrome cannot have length $3k+2$ or $3k+3$ for non-negative integer $k$ , and thus $S_{k}$ contains no twin-palindromes of length less than $3(k+1)-1$ . Hence we can use $S_{k+1}=S_{k}$ , and we only have to compute $R_{k+1}$ . For any position $i\in[1,m]$ , if $R_{k}[i]<k$ , then we can directly assign $R_{k+1}[i]=R_{k}[i]$ without performing any comparisons. In this case, we also transfer the charge (if any) from $R_{k}[i]$ to $R_{k+1}[i]$ . If $R_{k}[i]=k$ , then we check if $S_{k}[i-k-1]=S_{k}[i+k+1]$ (assuming $i-k-1\geq 1$ and $i+k+1\leq m$ ). If the answer is yes, we assign $R_{k+1}[i]=k+1$ (and do not charge the position). If the answer is no, then we assign $R_{k+1}[i]=k$ and charge position $i$ . Note that we charge one previously uncharged position for each comparison with negative outcome. Later, we use the charges to count the overall number of comparisons.

Performing a Regular Iteration.

It remains to show how to proceed in phase $k$ if we actually find a twin-palindrome $S_{k}[p\operatorname{{.}\,{.}}p+3k]$ . In this case, $S_{k}[p\operatorname{{.}\,{.}}p+k]$ generates $S_{k}[p\operatorname{{.}\,{.}}p+3k]$ from left to right, and thus $S_{k}[1\operatorname{{.}\,{.}}p+k]S_{k}(p+3k\operatorname{{.}\,{.}}m]$ generates $S_{k}$ from left to right. However, $S_{k}[1\operatorname{{.}\,{.}}p+k]S_{k}(p+3k\operatorname{{.}\,{.}}m]$ may contain twin-palindromes of length less than $3k+1$ that were not present in $S_{k}$ . We apply Theorem 8 with parameter $3k$ to $S_{k}[1\operatorname{{.}\,{.}}p+k]$ and $S_{k}(p+3k\operatorname{{.}\,{.}}m]$ and obtain values $n^{\prime},m^{\prime}$ such that $S_{k}[1\operatorname{{.}\,{.}}p+k-n^{\prime}]S_{k}(p+3k+m^{\prime}% \operatorname{{.}\,{.}}m]$ generates $S_{k}$ from left to right and contains neither unit-square nor twin-palindrome of length less than $3k+1$ .

Let $\ell=p+k-n^{\prime}$ and $r=p+3k+m^{\prime}$ . We will replace $S_{k}$ with $S_{k}^{\prime}=S_{k}[1\operatorname{{.}\,{.}}\ell]S_{k}(r\operatorname{{.}\,{.% }}m]$ . Therefore, we also have to replace $R_{k}$ with the appropriate array $R_{k}^{\prime}$ . Note that any entry $R_{k}^{\prime}[i]$ depends solely on a short local substring $S_{k}^{\prime}[i-k\operatorname{{.}\,{.}}i+k]$ . This allows us to copy all except ${\mathcal{O}}(k)$ entries from $R_{k}^{\prime}$ to $R_{k}$ . More precisely:

$\blacksquare$

For $i\in[1,\ell-k]$ , it holds $R_{k}[i]=R_{k}^{\prime}[i]$ due to $S_{k}[1\operatorname{{.}\,{.}}\ell]=S_{k}^{\prime}[1\operatorname{{.}\,{.}}\ell]$ .
$\blacksquare$

For $i\in(r+k,m]$ , it holds $R_{k}[i]=R_{k}^{\prime}[i-(r-\ell)]$ due to $S_{k}(r\operatorname{{.}\,{.}}m]=S_{k}^{\prime}(\ell\operatorname{{.}\,{.}}m-r% +\ell]$ .

The only missing entries are the $R_{k}^{\prime}[i]$ with $i\in([1,m-(r-\ell)]\cap(\ell-k,\ell+k])$ . We compute each such value from scratch. This requires only one comparison with negative outcome per value, or ${\mathcal{O}}(k)$ comparisons with negative outcome overall.

Finally, we have to handle the charged positions. We simply transfer the charges from $R_{k}[1\operatorname{{.}\,{.}}\ell-k]$ to $R_{k}^{\prime}[1\operatorname{{.}\,{.}}\ell-k]$ , and from $R_{k}(r+k\operatorname{{.}\,{.}}m]$ to $R_{k}^{\prime}(r+k\operatorname{{.}\,{.}}m]$ (respectively only if $\ell>k$ and $r+k<m$ ). We have accounted for all charges, except for the ones in interval $R_{k}[\max(1,\ell-k+1)\operatorname{{.}\,{.}}\min(m,r+k)]$ . This interval is of length $r-\ell+2k={\mathcal{O}}(k+n^{\prime}+m^{\prime})$ . We will pay for the charges directly, without transferring them to the next iteration. In this iteration, we have to pay for the following comparisons with negative outcome:

$\blacksquare$

${\mathcal{O}}(k+n^{\prime}+m^{\prime})$ comparisons for applying Theorem 8 with parameter $3k$ , and
$\blacksquare$

${\mathcal{O}}(k)$ comparisons for computing the entries of $R_{k}^{\prime}$ that cannot be copied from $R_{k}$ , and
$\blacksquare$

${\mathcal{O}}(k+n^{\prime}+m^{\prime})$ comparisons that were performed in previous iterations, stored as charges of entries in $R_{k}$ that could not be copied to $R_{k}^{\prime}$ .

This sums to ${\mathcal{O}}(k+n^{\prime}+m^{\prime})$ comparisons with negative outcome, which we can afford because we permanently reduced the length of the string by $\left\lvert S_{k}\right\rvert-\left\lvert S_{k}^{\prime}\right\rvert=r-\ell=2k% +n^{\prime}+m^{\prime}$ . After the algorithm terminates, there are at most ${\mathcal{O}}(n)$ charges on the final $R_{k}$ that have not been accounted for. Hence the number of comparisons with negative outcome is ${\mathcal{O}}(n)$ overall, which concludes the proof.

[bib.bib1] [1] Amihood Amir and Itai Boneh. Dynamic palindrome detection. CoRR, abs/1906.09732, 2019. arXiv:1906.09732.

[bib.bib2] [2] Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition detection in a dynamic string. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pages 5:1–5:18, 2019. doi:10.4230/LIPICS.ESA.2019.5.

[bib.bib3] [3] Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common substring made fully dynamic. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pages 6:1–6:17, 2019. doi:10.4230/LIPIcs.ESA.2019.6.

[bib.bib4] [4] V. L. Arlazarov, E. A. Dinitz, M. A. Kronrod, and I. A. Faradzhev. On economical construction of the transitive closure of an oriented graph. Dokl. Akad. Nauk SSSR, 194(3):487–488, 1970. (Original in Russian). URL: https://www.mathnet.ru/eng/dan35675.

[bib.bib5] [5] Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. LRM-trees: Compressed indices, adaptive sorting, and compressed permutations. Theor. Comput. Sci., 459:26–41, 2012. doi:10.1016/J.TCS.2012.08.010.

[bib.bib6] [6] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM J. Comput., 48(2):481–512, 2019. doi:10.1137/17M112720X.

[bib.bib7] [7] Panagiotis Charalampopoulos. Data structures for strings in the internal and dynamic settings. PhD thesis, King’s College London (University of London), UK, 2020. doi:10.12681/eadd/57602.

[bib.bib8] [8] Jean-Pierre Duval, Thierry Lecroq, and Arnaud Lefebvre. Linear computation of unbordered conjugate on unordered alphabet. Theor. Comput. Sci., 522:77–84, 2014. doi:10.1016/J.TCS.2013.12.008.

[bib.bib9] [9] Jonas Ellert. Efficient string algorithmics across alphabet realms. PhD thesis, Technical University of Dortmund, Germany, 2024. doi:10.17877/DE290R-24255.

[bib.bib10] [10] Jonas Ellert and Johannes Fischer. Linear time runs over general ordered alphabets. In Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), pages 63:1–63:16, 2021. doi:10.4230/LIPICS.ICALP.2021.63.

[bib.bib11] [11] Jonas Ellert, Pawel Gawrychowski, and Garance Gourdel. Optimal square detection over general alphabets. CoRR, abs/2303.07229, 2023. doi:10.48550/arXiv.2303.07229.

[bib.bib12] [12] Jonas Ellert, Paweł Gawrychowski, and Garance Gourdel. Optimal square detection over general alphabets. In Proceedings of the 34th Annual Symposium on Discrete Algorithms (SODA 2023), pages 5220–5242, 2023. doi:10.1137/1.9781611977554.ch189.

[bib.bib13] [13] Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster queries for longest substring palindrome after block edit. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019), pages 27:1–27:13, 2019. doi:10.4230/LIPIcs.CPM.2019.27.

[bib.bib14] [14] Zvi Galil and Kunsoo Park. Truly alphabet-independent two-dimensional pattern matching. In Proceedings of the 33rd Annual Symposium on Foundations of Computer Science (FOCS 1992), pages 247–256, 1992. doi:10.1109/SFCS.1992.267767.

[bib.bib15] [15] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977. doi:10.1137/0206024.

[bib.bib16] [16] Dmitry Kosolobov. Finding the leftmost critical factorization on unordered alphabet. CoRR, abs/1509.01018, 2015. arXiv:1509.01018.

[bib.bib17] [17] Dmitry Kosolobov. Lempel-Ziv factorization may be harder than computing all runs. In Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), pages 582–593, 2015. doi:10.4230/LIPICS.STACS.2015.582.

[bib.bib18] [18] Michael G. Main and Richard J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms, 5(3):422–432, 1984. doi:10.1016/0196-6774(84)90021-X.

[bib.bib19] [19] Glenn K. Manacher. A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM, 22(3):346–351, 1975. doi:10.1145/321892.321896.

[bib.bib20] [20] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16–34, 2002. doi:10.1145/505241.505243.

[bib.bib21] [21] Ian Pratt-Hartmann. Walking on words. CoRR, abs/2208.08913, 2022. doi:10.48550/arXiv.2208.08913.

[bib.bib22] [22] Ian Pratt-Hartmann. Walking on words. In Proceedings of the 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024), pages 25:1–25:17, 2024. doi:10.4230/LIPICS.CPM.2024.25.

[bib.bib23] [23] Robert E. Tarjan. Problems in data structures and algorithms. In Graph Theory, Combinatorics and Algorithms: Interdisciplinary Applications, pages 17–39. Springer US, Boston, MA, 2005. doi:10.1007/0-387-25036-0_2.

[bib.bib24] [24] Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon substring after edit. In Proceedings of the 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018), pages 19:1–19:10, 2018. doi:10.4230/LIPIcs.CPM.2018.19.

[bib.bib25] [25] Hao Yuan and Mikhail J. Atallah. Data structures for range minimum queries in multidimensional arrays. In Proceedings of the 21st Annual Symposium on Discrete Algorithms (SODA 2010), pages 150–160, 2010. doi:10.1137/1.9781611973075.14.

Minimal Generators in Optimal Time

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction and Related Work

Theorem 1.

1.1 Technical Overview and Roadmap

1.2 Related Work

Optimum Stack Generation.

Repetition Detection in Dynamic Strings.

Repetition Detection and Other Problems Over Unordered Alphabets.

2 Preliminaries

Intervals, Alphabets, Strings.

Model of Computation.

Walking on Strings.

Observation 2 (Properties of Left-to-Right Generation).

Minimal Generators.

Lemma 3 ([22, Lemma 6]).

Lemma 4.

Observation 5.

Lemma 6.

Proof.

Lemma 7.

Proof.

3 Eliminating Twin-Palindromes in 𝓞⁢(𝒏⁢𝐥𝐨𝐠⁡𝒏) Time

Theorem 8.

Corollary 9.

Proof.

Lemma 10.

Proof.

Lemma 11.

Proof.

Proof.

4 Eliminating Twin-Palindromes in 𝓞⁢(𝒏⁢𝐥𝐨𝐠∗⁡𝒏) Time

Lemma 12.

Proof.

Theorem 13.

Proof.

Appending a Symbol.

Eliminating Twin-Palindromes.

Analyzing the Time Complexity.

4.1 Obtaining the Recursive Algorithm

Lemma 14.

Proof.

Corollary 15.

Proof.

5 Eliminating Twin-Palindromes in 𝓞⁢(𝒏) Time

Lemma 16.

Proof.

References

Appendix A Eliminating Twin-Palindromes in 𝓞⁢(𝒏) Comparisons

Lemma 17.

Limiting the Number of Oracle Queries with Positive Answer.

Eliminating Twin-Palindromes.

Performing a Final Iteration.

Performing a Regular Iteration.

3 Eliminating Twin-Palindromes in ${\mathcal{O}}(n\log n)$ Time

4 Eliminating Twin-Palindromes in ${\mathcal{O}}(n\log^{*}n)$ Time

5 Eliminating Twin-Palindromes in ${\mathcal{O}}(n)$ Time

Appendix A Eliminating Twin-Palindromes in ${\mathcal{O}}(n)$ Comparisons