String Periods in the Order-Preserving Model

The order-preserving model (op-model, in short) was introduced quite recently but has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time $O(n)$, $O(n\log\log n)$, $O(n \log^2 \log n/\log \log \log n)$, $O(n\log n)$ depending on the type of periodicity. In the most general variant the number of different periods can be as big as $\Omega(n^2)$, and a compact representation is needed. Our algorithms require novel combinatorial insight into the properties of such periods.


Introduction
Study of strings in the order-preserving model (op-model, in short) is a part of the so-called non-standard stringology. It is focused on pattern matching and repetition discovery problems in the shapes of number sequences. Here the shape of a sequence is given by the relative order of its elements. The applications of the op-model include finding trends in time series which appear naturally when considering e.g. the stock market or melody matching of two musical scores; see [33]. In such problems periodicity plays a crucial role.
One of motivations is given by the following scenario. Consider a sequence D of numbers that models a time series which is known to repeat the same shape every fixed period of time. For example, this could be certain stock market data or statistics data from a social network that is strongly dependent on the day of the week, i.e., repeats the same shape every consecutive week. Our goal is, given a fragment S of the sequence D, to discover such repeating shapes, called here op-periods, in S. We also consider some special cases of this setting. If the beginning of the sequence S is synchronized with the beginning of the repeating shape in D, we refer to the repeating shape as to an initial op-period. If the synchronization takes place also at the end of the sequence, we call the shape a full op-period. Finally, we also consider sliding op-periods that describe the case when every factor of the sequence D repeats the same shape every fixed period of time.
Order-preserving model. Let a..b denote the set {a, . . . , b}. We say that two strings X = X [1] . . . X [n] and Y = Y [1] . . . Y [n] over an integer alphabet are order-equivalent (equivalent in short), written X ≈ Y , iff Order-equivalence is a special case of a substring consistent equivalence relation (SCER) that was defined in [38].
For a string S of length n, we can create a new string X of length n such that X[i] is equal to the number of distinct symbols in S that are not greater than S[i]. The string X is called the shape of S and is denoted by shape(S). It is easy to observe that two strings S, T are order-equivalent if and only if they have the same shape.
Periods in the op-model. We consider several notions of periodicity in the op-model, illustrated by Fig. 1. We say that a string S has a (general) op-period p with shift s ∈ 0..p − 1 if and only if p < |S| and S is a factor of a string V 1 V 2 · · · V k such that: |V 1 | = · · · = |V k | = p, V 1 ≈ · · · ≈ V k , and S[s + 1..|S|] is a prefix of V 2 · · · V k .
The shape of the op-period is shape(V 1 ). One op-period p can have several shifts; to avoid ambiguity, we sometimes denote the op-period as (p, s). We define Shifts p as the set of all shifts of the op-period p.
An op-period p is called initial if 0 ∈ Shifts p , full if it is initial and p divides |S|, and sliding if Shifts p = 0..p − 1 . Initial and sliding op-periods are particular cases of block-based and sliding-window-based periods for SCER, both of which were introduced in [38]. Models of periodicity. In the standard model, a string S of length n has a period p iff S[i] = S[i + p] for all i = 1, . . . , n − p. The famous periodicity lemma of Fine and Wilf [27] states that a "long enough" string with periods p and q has also the period gcd(p, q). The exact bound of being "long enough" is p+q −gcd(p, q). This result was generalized to arbitrary number of periods [10,32,41]. Periods were also considered in a number of non-standard models. Partial words, which are strings with don't care symbols, possess quite interesting Fine-Wilf type properties, including probabilistic ones; see [5,6,7,39,40,31]. In Section 2, we make use of periodicity graphs introduced in [39,40]. In the abelian (jumbled) model, a version of the periodicity lemma was shown in [16] and extended in [8]. Also, algorithms for computing three types of periods analogous to full, initial, and general op-periods were designed [20,25,26,34,35,36]. In the computation of full and initial op-periods we use some numbertheoretic tools initially developed in [34,35]. Remarkably, the fastest known algorithm for computing general periods in the abelian model has essentially quadratic time complexity [20,36], whereas for the general op-periods we design a much more efficient solution. A version of the periodicity lemma for the parameterized model was proposed in [2].
Op-periods were first considered in [38] where initial and sliding op-periods were introduced and direct generalizations of the Fine-Wilf property to these kinds of op-periods were developed. A few distinctions between the op-periods and periods in other models should be mentioned. First, "to have a period 1" becomes a trivial property in the op-model. Second, all standard periods of a string have the "sliding" property; the first string in Fig. 1 demonstrates that this is not true for op-periods. The last distinction concerns borders. A standard period p in a string S of length n corresponds to a border of S of length n − p, which is both a prefix and a suffix of S. In the order-preserving setting, an analogue of a border is an op-border, that is, a prefix that is equivalent to the suffix of the same length. Op-borders have properties similar to standard borders and can be computed in O(n) time [37]. However, it is no longer the case that a (general, initial, full, or sliding) op-period must correspond to an op-border; see [38].
Previous algorithmic study of the op-model. The notion of order-equivalence was introduced in [33,37]. (However, note the related combinatorial studies, originated in [23], on containment/avoidance of shapes in permutations.) Both [33,37] studied pattern matching in the op-model (op-pattern matching) that consists in identifying all consecutive factors of a text that are order-equivalent to a given pattern. We assume that the alphabet is integer and, as usual, that it is polynomially bounded with respect to the length of the string, which means that a string can be sorted in linear time (cf. [17]). Under this assumption, for a text of length n and a pattern of length m, [33] solve the op-pattern matching problem in O(n + m log m) time and [37] solve it in O(n + m) time. Other op-pattern matching algorithms were presented in [3,15].
An index for op-pattern matching based on the suffix tree was developed in [19]. For a text of length n it uses O(n) space and answers op-pattern matching queries for a pattern of length m in optimal, O(m) time (or O(m + Occ) time if we are to report all Occ occurrences). The index can be constructed in O(n log log n) expected time or O(n log 2 log n/ log log log n) worst-case time. We use the index itself and some of its applications from [19].
Other developments in this area include a multiple-pattern matching algorithm for the op-model [33], an approximate version of op-pattern matching [29], compressed index constructions [13,22], a small-space index for op-pattern matching that supports only short queries [28], and a number of practical approaches [9,11,12,14,24].
Our results. We give algorithms to compute: • all full op-periods in O(n) time; • the smallest non-trivial initial op-period in O(n) time; • all initial op-periods in O(n log log n) time; • all sliding op-periods in O(n log log n) expected time or O(n log 2 log n/ log log log n) worst-case time (and linear space); • all general op-periods with all their shifts (compactly represented) in O(n log n) time and space. The output is the family of sets Shifts p represented as unions of disjoint intervals. The total number of intervals, over all p, is O(n log n).
In the combinatorial part, we characterize the Fine-Wilf periodicity property (aka interaction property) in the op-model in the case of coprime periods. This result is at the core of the linear-time algorithm for the smallest initial op-period.
Structure of the paper. Combinatorial foundations of our study are given in Section 2. Then in Section 3 we recall known algorithms and data structures for the op-model and develop further algorithmic tools. The remaining sections are devoted to computation of the respective types of op-periods: full and initial op-periods in Section 4, the smallest non-trivial initial op-period in Section 5, all (general) op-periods in Section 6, and sliding op-periods in Section 7.

Fine-Wilf Property for Op-Periods
The following result was shown as Theorem 2 in [38]. Note that if p and q are coprime, then the conclusion is void, as every string has the op-period 1.

Theorem 1 ([38]
). Let p > q > 1 and d = gcd(p, q). If a string S of length n ≥ p + q − d has initial op-periods p and q, it has initial op-period d. Moreover, if S has length n ≥ p + q − 1 and sliding op-periods p and q, it has sliding op-period d.
The aim of this section is to show a periodicity lemma in the case that gcd(p, q) = 1. A string which is strictly increasing, strictly decreasing, or constant, is called strictly monotone. A strictly monotone op-period of S is an op-period with a strictly monotone shape. Such an op-period is called increasing (decreasing, constant) if so is its shape. Clearly, any divisor of a strictly monotone op-period is a strictly monotone op-period as well. A string S is 2-monotone if S = S 1 S 2 , where S 1 , S 2 are strictly monotone in the same direction.

Preliminary Notation
Below we assume that n > p > q > 1. Let a string S = S[1.
.n] have op-periods (p, i) and (q, j). If there exists a number k ∈ 1..n − 1 such that k mod p = i and k mod q = j, we say that these op-periods are synchronized and k is a synchronization point (see Fig. 2).  Theorem 2. Let p > q > 1 and d = gcd(p, q). If op-periods p and q of a string S of length n ≥ p + q − 1 are synchronized, then S has op-period d, synchronized with them.

Periodicity Theorem For Coprime Periods
For a string S, by trace(S) we denote a string X of length |S| − 1 over the alphabet {+, 0, -} such that: Observation 1. (1) A string is strictly monotone iff its trace is a unary string. To study traces of strings with two op-periods, we use periodicity graphs (see Fig. 3 below) very similar to those introduced in [39,40] for the study of partial words with two periods. The periodicity graph G(n, p, i, q, j) represents all strings S of length n+1 having the op-periods (p, i) and (q, j). Its vertex set 1..n is the set of positions of the trace trace(S). Two positions are connected by an edge iff they contain equal symbols according to Observation 1 (2). For convenience, we distinguish between pand q-edges, connecting positions in the same residue class modulo p (resp., modulo q). The construction of G(n, p, i, q, j) is split in two steps: first we build a draft graph H(n, p, q) (see Fig. 3,a), containing all pand q-edges for each residue class, and then delete all edges of the orange clique corresponding to the ith class modulo p and all edges of the blue clique corresponding to the jth class modulo q (see Fig. 3,b,c). If some vertices k, l belong to the same connected component of G = G(n, p, i, q, j), then trace(S)[k] = trace(S)[l] for every string S corresponding to G. In particular, if G is connected, then trace(S) is unary and S is strictly monotone by Observation 1(1). (c) periodicity graph G (17,8,5,5,2). Orange/blue are p-edges (resp., q-edges) and the vertices equal to i modulo p (resp., to j modulo q). It turns out that the existence of two coprime op-periods makes a string "almost" strictly monotone.
Theorem 3. Let S be a string of length n that has coprime op-periods p and q with shifts i and j, respectively, such that n > p > q > 1. Then: (a) if n > pq, then S has a strictly monotone op-period pq; (b) if 2p < n ≤ pq and the op-periods are synchronized, then S is 2-monotone; (c) if p+q < n ≤ 2p and the op-periods are synchronized, then (q, j) is a strictly monotone op-period of S; (d) if n > max{2p, p+2q} and the op-periods are not synchronized, then S is strictly monotone; (e) if n > 2p, the op-periods are not synchronized, and p is initial, then S is strictly monotone; (f) if p+q < n ≤ 2p and p is initial, then (q, j) is a strictly monotone op-period of S.
Proof. Take a string S of length n having op-periods p (with shift i) and q (with shift j). Let n = n − 1.
Consider the draft graph H(n , p, q) (see Fig. 3,a). It consists of q q-cliques (numbered from 0 to q − 1 by residue classes modulo q) connected by some p-edges. If n = p + q, there are exactly q p-edges, which connect q-cliques in a cycle due to coprimality of p and q. Thus we have a cyclic order on q-cliques: for the clique k, the next one is (k+p) mod q. The number of p-edges connecting neighboring cliques increases with the number of vertices: if n ≥ 2p, every vertex has an adjacent p-edge, and if n ≥ p + 2q, every q-clique is connected to the next q-clique by at least two p-edges.
To obtain the periodicity graph G(n , p, i, q, j), one should delete all edges of the ith p-clique and the jth q-clique from H(n , p, q). First consider the effect of deleting p-edges. If the ith p-clique has at least three vertices, then after the deletion each q-clique will still be connected to the next one. Indeed, if we delete edges between i, i+p, and i+2p, then there are still the edges (i+q, i+p+q) and (i+p−q, i+2p−q), connecting the corresponding q-cliques. If the p-clique has a single edge, its deletion will break the connection between two neighboring q-cliques if they were connected by a single edge. This is not the case if n ≥ p+2q, but may happen for any smaller n ; see Fig. 3,c, where n = p+2q−1.
Now look at the effect of only deleting q-edges from H(n , p, q). If all vertices in the jth q-clique have p-edges (this holds for any j if n ≥ 2p), the graph after deletion remains connected; if not, it consists of a big connected component and one or more isolated vertices from the jth q-clique.
Finally we consider the cumulative effect of deleting pand q-edges. Any synchronization point becomes an isolated vertex. In total, there are two ways of making the draft graph disconnected: break the connection between neighboring q-cliques distinct from the removed q-clique (Fig. 4,a) or get isolated vertices in the removed q-clique (Fig. 4,b). The first way does not work if n ≥ p + 2q (see above) or if the op-periods are synchronized (the removed p-edge was adjacent to the removed q-clique). For the second way, only synchronization points are isolated if n ≥ 2p (each vertex has a p-edge, see above). Note that in this case all non-isolated vertices of periodicity graph are connected. Hence all positions of the trace trace(S), except for the isolated ones, contain the same symbol. So all factors of S involving no isolated positions are strictly monotone (in the same direction). a . . . . . . (c) all isolated positions are equal modulo q; (d) the condition on n excludes both ways to disconnect the draft graph; (e,f) for the initial op-period, i = p; if n ≤ 2p, there is no deletion of p-edges; if n > 2p, then the q-cliques connected by the edge (p, 2p) are also connected by (p−q, 2p−q); so only the disconnection by isolated positions is possible.

Algorithmic Toolbox for Op-Model
Let us start by recalling the encoding for op-pattern matching (op-encoding) from [19,37]. For a string S of length n and i ∈ 1..n we define: If there is no such j, then α i (S) = 0. Similarly, we define: and β i (S) = 0 if no such j exists. Then (α 1 (S), β 1 (S)), . . . , (α n (S), β n (S)) is the op-encoding of S. It can be computed efficiently as mentioned in the following lemma. The op-encoding can be used to efficiently extend a match.
Lemma 2. Let X and Y be two strings of length n and assume that the op-encoding of X is known. If Proof. Let i = α n (X) and j = β n (X). Lemma 3 from [19] asserts that, if i = j, then .n] that is equivalent to a prefix of S. It is a direct analogue of the PREF array used in standard string matching (see [21]) and can be computed similarly in O(n) time using one of the standard encodings for the op-model that were used in [15,19,37]; see lemma below. Proof. Let S be a string of length n. The standard linear-time algorithm for computing the PREF table for S (see, e.g., [21]) uses the following two properties of the Let us mention an application of the op-PREF table that is used further in the algorithms. We denote by op-LPP p (S) ("longest op-periodic prefix") the length of the longest prefix of a string S having p as an initial op-period.

Longest Common Extension Queries
For a string S, we define a longest common extension query op-LCP(i, j) in the order-preserving model as Similarly as in the standard model [18], LCP-queries in the op-model can be answered using lowest common ancestor (LCA) queries in the op-suffix tree; see the following lemma. Proof. The order-preserving suffix tree (op-suffix tree) that is constructed in [19] is a compacted trie of op-encodings of all the suffixes of the text. In O(n log log n) expected time or O(n log 2 log n/ log log log n) worst-case time one can construct a so-called incomplete version of the op-suffix tree in which each explicit node may have at most one edge whose first character label is not known. Fortunately, for op-LCP-queries the labels of the edges are not needed; the only required information is the depth of each explicit node and the location of each suffix. Therefore, for this purpose the incomplete op-suffix tree can be treated as a regular suffix tree and preprocessed using standard lowest common ancestor data structure that requires additional O(n) preprocessing and can answer queries in O(1) time [4]. Op-squares were first defined in [19] where an algorithm computing all the sets op-Squares p for a string of length n in O(n log n + p |op-Squares p |) time was shown.

Order-preserving Squares
We say that an op-square S[i..i + 2p − 1] is right shiftable if S[i + 1..i + 2p] is an op-square and right non-shiftable otherwise. Similarly, we say that the op-square is left shiftable if S[i − 1..i + 2p − 2] is an op-square and left non-shiftable otherwise. Using the approach of [19], one can show the following lemma. Claim (See Lemma 18 in [19]). All the right non-extendible op-squares in a string of length n can be computed in O(n log n) time.
Note that a right non-shiftable op-square is also right non-extendible, but the converse is not necessarily true. Thus it suffices to filter out the op-squares that are right shiftable. For this, for a right non-extendible op-square S[i..i + 2p − 1] we need to check if op-LCP(i + 1, i + p + 1) < p. This condition can be verified in O(1) time after o(n log n)-time preprocessing using Lemma 5.

Computing All Full and Initial Op-Periods
For a string S of length n, we define op-PREF [i] for i = 0, . . . , n as: Here we assume that op-PREF[n + 1] = 0. In the computation of full and initial op-periods we heavily rely on this table according to the following obvious observation.

Computing Initial Op-Periods
Let us introduce an auxiliary array P [0.
.n] such that: Straight from Observation 2 we have:

Observation 3. p is an initial period of S if and only if P [p] ≥ p.
The table T could be computed straight from definition in O(n log n) time. We improve this complexity to O(n log log n) by employing Eratosthenes's sieve. The sieve computes, in particular, for each j = 1, . . . , n a list of all distinct prime divisors of j. We use these divisors to compute the table via dynamic programming in a right-to-left scan, as shown in Algorithm 1.

Computing Full Op-Periods
Let us recall the following auxiliary data structure for efficient gcd-computations that was developed in [35]. We will only need a special case of this data structure to answer queries for gcd(x, n). Let Div (i) denote the set of all positive divisors of i. In the case of full op-periods we only need to compute P [p] for p ∈ Div (n). As in Algorithm 1, we start with T = op-PREF . Then we perform a preprocessing phase that shifts the information stored in the array from indices i ∈ Div (n) to indices gcd(i, n) ∈ Div (n). It is based on the fact that for d ∈ Div (n), d | i if and only if d | gcd(i, n). Finally, we perform right-to-left processing as in Algorithm 1. However, this time we can afford to iterate over all divisors of elements from Div (n). Thus we arrive at the pseudocode of Algorithm 2.

Computing Smallest Non-Trivial Initial Op-Period
If a string is not strictly monotone itself, it has O(n) such op-periods and they can all be computed in O(n) time. We use this as an auxiliary routine in the computation of the smallest initial op-period that is greater than 1.
Theorem 6. If a string of length n is not strictly monotone, all of its strictly monotone op-periods can be computed in O(n) time.
Proof. We show how to compute all the strictly increasing op-periods of a string S that is not strictly monotone itself; computation of strictly decreasing and constant op-periods is the same. Let S be a string of length n and let us denote X = trace(S). Let A = {a 1 , . . . , a k } be the set of all positions a 1 < · · · < a k in X such that X[i] = +; by the assumption of this theorem, we have that A = ∅. This set provides a simple characterization of strictly increasing op-periods of S. First, assume that |A| = 1. By Observation 4, each p = 1, . . . , n is an op-period of S with the shift s = a 1 mod p. From now we can assume that |A| > 1.  . . . , b i ). We want to compute d k .
Note that d i | d i−1 for all i = 2, . . . , k. Hence, the sequence (d i ) contains at most log n + 1 distinct values.
Hence, we can compute d i using Euclid's algorithm in O(log n) time. The latter situation takes place at most log n + 1 times; the conclusion follows.
Consider the set B = {a 2 − a 1 , a 3 − a 2 , . . . , a k − a k−1 }. By Observation 4, (p, s) is a strictly increasing op-period of S if and only if p | gcd(B) and s = a 1 mod p. Thus there is exactly one strictly increasing op-period of each length that divides gcd(B) and its shift is determined uniquely.
The value gcd(B) can be computed in O(n) time by Claim . Afterwards, we find all its divisors and report the op-periods in O( √ n) time.
Let us start with the following simple property.
Lemma 7. The shape of the smallest non-trivial initial op-period of a string has no shorter non-trivial full op-period.
Proof. A full op-period of the initial op-period of a string S is an initial op-period of S.
Now we can state a property of initial op-periods, implied by Theorem 3, that is the basis of the algorithm.
Lemma 8. If a string of length n has initial op-periods p > q > 1 such that p + q < n and gcd(p, q) = 1, then q is strictly monotone.
Proof. Let us consider three cases. If n > pq, then by Theorem 3(a), both p and q are strictly monotone. If 2p < n ≤ pq, then Theorem 3(e) implies that S[1..pq − 1] is strictly monotone, hence p and q are strictly monotone as well. Finally, if p + q < n ≤ 2p, we have that q is strictly monotone by Theorem 3(f). Proof. We follow the lines of Algorithm 3. If S is not strictly monotone itself, we can compute the smallest non-trivial strictly monotone initial op-period of S using Theorem 6. Otherwise, the smallest such op-period is 2. If S has a non-trivial strictly monotone initial op-period and the smallest such op-period is q > 1, then none of 2, . . . , q − 1 is an initial op-period of S. Hence, we can safely return q.
Let us now focus on the correctness of the while-loop. The invariant is that there is no initial op-period of S that is smaller than p. If the value of k = op-LPP p (S) equals n, then p is an initial op-period of S and we can safely return it. Otherwise, we can advance p by 1. There is also no smallest initial op-period p such that p < p < k − p − 1. Indeed, Lemma 8 would imply that p is strictly monotone if gcd(p, p ) = 1 (which is impossible due to the initial selection of p) and Theorem 1 would imply an initial op-period of S[1..p ] that is smaller than p and divides p if gcd(p, p ) > 1 (which is impossible due to Lemma 7). This justifies the way p is increased. The case that p doubles can take place at most O(log n) times and the total sum of p over such cases is O(n).
Our goal is to compute a compact representation of all the op-periods of a string that contains, for each op-period p, an interval representation of the set Shifts p .
For an integer set X, by X mod p we denote the set {x mod p : x ∈ X}. The following technical lemma provides efficient operations on interval representations of sets.
Lemma 9. (a) Assume that X and Y are two sets with interval representations of sizes x and y, respectively. Then the interval representation of the set X ∩ Y can be computed in O(x + y) time. (b) Assume that X 1 , . . . , X k ⊆ 0..n are sets with interval representations of sizes x 1 , . . . , x k and p 1 , . . . , p k be positive integers. Then the interval representations of all the sets X 1 mod p 1 , . . . , X k mod p k can be computed in O(x 1 + · · · + x k + k + n) time.
Proof. To compute X ∩ Y in point (a), it suffices to merge the lists of endpoints of intervals in the interval representations of X and Y . Let L be the merged list. With each element of L we store a weight +1 if it represents the beginning of an interval and a weight −1 if it represents the endpoint of an interval. We compute the prefix sums of these weights for L. Then, by considering all elements with a prefix sum equal to 2 and their following elements in L, we can restore the interval representation of X ∩ Y . Let us proceed to point (b). Note that, for an interval i..j , the set i..j mod p either equals 0..p − 1 if j − i ≥ p, or otherwise is a sum of at most two intervals. For each interval i..j in the representation of X a , for a = 1, . . . , k, we compute the interval representation of i..j mod p a . Now it suffices to compute the sum of these intervals for each X a . This can be done exactly as in point (a) provided that the endpoints of the intervals comprising representations of i..j mod p a are sorted. We perform the sorting simultaneously for all X a using bucket sort [17]. The total number of endpoints is O(x 1 + · · · + x k ) and the number of possible values of endpoints is at most n. This yields the desired time complexity of point (b).
Lemma 10. For a string of length n, interval representations of the sets op-Squares p for all 1 ≤ p ≤ n/2 can be computed in O(n log n) time.
Proof. Let us define the following two auxiliary sets. Let us note that, for each p, |L p | = |R p |. Thus let L p = { 1 , . . . , k } and R p = {r 1 , . . . , r k }. The interval representation of the set op-Squares p is 1 ..r 1 ∪ · · · ∪ k ..r k . Clearly, it can be computed in O(|L p |) time.
We will use the following characterization of op-periods.
Lemma 9(a) 13 return Shifts p for p = 1, . . . , n; Operations "mod" on sets are performed simultaneously using Lemma 9(b). All sets A p , B p , C p have O(n log n)-sized representations. This guarantees O(n log n) time.

Computing Sliding Op-Periods
For a string S of length n, we define a family of strings SH 1 , . . . , SH n such that SH k Note that the characters of the strings are shapes. Moreover, the total length of strings SH k is quadratic in n, so we will not compute those strings explicitly. Instead, we use the following observation to test if two symbols are equal.
Sliding op-periods admit an elegant characterization based on SH k ; see Figure 5. If p ≤ 1 2 n, then the former property yields SH p [i] = SH p [i + p] for every 1 ≤ i ≤ n − 2p + 1, i.e., that p is a period of SH p .
For a proof in the other direction, suppose that p satisfies the characterization of Lemma 11. If p > 1 2 n, this yields op-LCP(1, p + 1) = n − p = op-LCS(n − p, n). Otherwise, S[i..i + 2p − 1] is an op-square for every 1 ≤ i ≤ n − 2p + 1 and, in particular, op-LCP(1, p + 1) ≥ p and op-LCS(n − p, n) ≥ p. In either case the characterization of Observation 5 yields that p is a sliding op-period.
For a string X, we denote the shortest period of X by per(X). for each k ≤ k.
For a proof of (b), we observe that p and q are both periods of SH k [1.. ]. If p + q ≤ , then Periodicity Lemma implies p | q. Thus, SH k [ We introduce a two-dimensional   Proof. First, observe that each line is executed O(n) times. Indeed, we always have t ≤ n and t is decremented at most n times in Line 4, so the number of increments in Line 7 is O(n).
Each instruction takes constant time except for the conditions in Lines 3 and 6. The test in Line 3 can be implemented using a single op-LCP query (due to Observation 6). Checking the condition in Line 6 requires a more careful implementation exploiting the structure of the queries.
Suppose that the variable t has been changed in iterations 1 < · · · < m . For consistence, we also define First, suppose that q = per(SH pi [1. . i ]) = per(SH pi [p i + 1..p i + i ]) ≤ 1 3 i , i.e., we are in the first branch. If SH pi [1..q] = SH pi [p i + 1..p i + q], then we must have SH pi [1.. i ] = SH pi [p i + 1..p i + i ], i.e., p i is a period of SH pi = SH pi [1..p i + i ] and p i is a sliding op-period due to Lemma 11. Moreover, any sliding op-period p > p i must be a period of SH pi (and, in particular, of SH pi [1..p i + 2q]) due to Lemma 12(a). Consequently, p ≥ p i+1 , as claimed.
In the second branch we only need to prove that SH pi [1.
. i ] = SH pi [p i + 1..p i + i ]. For a proof by contradiction, suppose that we have an equality. The condition from Line 6 means that the length-3 4 i prefix and suffix of SH pi [1.. i ] = SH pi [p i + 1..p i + i ] has the common shortest period q ≤ 1 3 3 4 i ≤ 1 4 i . The prefix and the suffix overlap by at least 1 2 i characters, so we actually have q = per(SH pi [1. . i ]) = per(SH pi [p i + 1..p i + i ]). Hence, in that case we would be in the first branch.
Finally, in the third branch we directly use Lemma 11 to check if p i is a sliding op-period. Moreover, if p > p i is also a sliding op-period, then p is a period of SH pi , i.e., p ≥ p i+1 . Proof. It suffices to bound the time complexity assuming that each op-LCP and op-LCS query takes unit time.
First, we observe that PER[k, ] and PER R [k, ] is used only for = n − 2k + 1 or = 3 4 (n − 2k + 1) . These O(n) values can be computed in O(n) time using Algorithm 5.
The condition in Line 4 can be verified using O(q) equality checks in SH p , whereas in Line 5, it suffices to compute the border table of SH p [1..2q]SH p [p + 1..p + 2q], which also takes O(q) time and equality checks in SH p . By a similar argument, the third branch can be implemented in O(|SH p |) = O(n − 2p + 1) time, whereas the second branch clearly takes O(1) time.
In order to prove that the total running time is O(n), we introduce a potential function. Let p i be the value of the variable p at the beginning of the ith iteration, let p i = min{p > p i : p is a period of SH pi }. Note that p i < p i ≤ |SH pi | due to p i ≤ 1 2 n. Moreover, p i+1 > p i and p i+1 ≥ p i by Lemma 12(a). Our potential function is i.e., we shall prove that the running time of the ith iteration is O(φ i+1 − φ i ).
The running time of the first branch is O(q), so we shall prove that φ i+1 −φ i ≥ q. Assume to the contrary that p i+1 − p i + p i+1 − p i < q. This yields that p i+1 < p i + q and p i+1 < p i + q. The first condition implies that SH pi [1.
Since q is a period of SH pi [p i+1 + 1..p i + i ] and of SH pi [1.
. i ], we conclude that p i+1 is a period of SH pi so p i+1 = p i . Due to Lemma 12(a), the condition p i+1 < p i + q = p i+1 + q implies The running time of the second branch is O(1) and we indeed have φ i+1 − φ i ≥ 1.
In the third branch, the running time is O(n − 2p + 1) and we shall prove that For a proof by contradiction, suppose that φ i+1 − φ i < 1 4 i . In this branch we have p i+1 = p i , so φ i+1 − φ i = p i+1 − p i . By Lemma 12(a), both p i and p i+1 are periods of SH pi . Hence, p i+1 − p i is a period of