Space-Efficient Online Computation of String Net Occurrences
Abstract
A substring of a string is said to be a repeat if occurs at least twice in . An occurrence of a repeat in is said to be a net occurrence if each of the substrings , , and occurs exactly once in . The occurrence of is said to be an extended net occurrence of . Let be an input string of length over an alphabet of size , and let denote the set of extended net occurrences of repeats in . Guo et al. [SPIRE 2024] presented an online algorithm which can report in in time, for each prefix of . Very recently, Inenaga [arXiv 2024] gave a faster online algorithm that can report in optimal time for each prefix of , where denotes the cardinality of a set . Both of the aforementioned data structures can be maintained in time and occupy space, where the -space requirement comes from the suffix tree data structure. In particular, Inenaga’s recent algorithm is based on Weiner’s right-to-left online suffix tree construction. In this paper, we show that one can modify Ukkonen’s left-to-right online suffix tree construction algorithm in space, so that can be reported in optimal time for each prefix of . This is an improvement over Guo et al.’s method that is also based on Ukkonen’s algorithm. Further, this leads us to the two following space-efficient alternatives:
-
A sliding-window algorithm of working space that can report in optimal time for each sliding window of size in .
-
A CDAWG-based online algorithm of working space that can report in optimal time for each prefix of , where is the number of edges in the CDAWG for .
All of our proposed data structures can be maintained in time for the input online string . We also discuss that the extended net occurrences of repeats in can be fully characterized in terms of the minimal unique substrings (MUSs) in .
Keywords and phrases:
string net occurrences, suffix trees, CDAWGs, maximal repeats, minimal unique substrings (MUSs)Funding:
Takuya Mieno: JSPS KAKENHI Grant Number JP24K20734.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Mathematics of computing Combinatorial algorithmsEditors:
Paola Bonizzoni and Veli MäkinenSeries and Publisher:

1 Introduction
Finding repeats in a string is a fundamental task of string processing that has applications in various fields including bioinformatics, data compression, and natural language processing. This paper focuses on the notion of net occurrences of a repeat in a string, which has attracted recent attention. Let be a repeat in a string such that occurs at least twice in . An occurrence of a repeat in is said to be a net occurrence of if extending the occurrence to the left or to the right results in a unique occurrence, i.e., each of , , and occurs exactly once in . Finding string net occurrences are motivated for Chinese language text processing [13, 14]. The occurrence of is said to be an extended net occurrence of a repeat in , and let denote the set of all extended net occurrences of repeats in .
Guo et al. [5] were the first who considered the problem of computing (extended) net occurrences of repeats in a string from view points of string combinatorics and algorithmics. Guo et al. [5] showed a necessary and sufficient condition for a net occurrence of a repeat, which, since then, has played a core role in efficient computation of string net occurrences. For an input string of length , they gave an offline algorithm for computing in time and space for integer alphabets of size polynomial in , and in time and space for general ordered alphabets of size . Their offline method is based on the suffix array [15] and the Burrows-Wheeler transform [4]. Ohlebusch et al. gave another offline algorithm that works fast in practice [18].
Later, Guo et al. [6] proposed an online algorithm for computing all string net occurrences of repeats. Their algorithm maintains a data structure of space that reports in time for each prefix of an online input string of length . Since their algorithm computes all the net occurrences upon a query, their algorithm requires at least time to maintain and update the list of all (extended) net occurrences of repeats in an online string. Their algorithm is based on Ukkonen’s left-to-right online suffix tree construction [22], that is enhanced with the suffix-extension data structure of Breslauer and Italiano [3].
Very recently, Inenaga [8] proposed a faster algorithm that can maintain for an online string with growing in a total of time and space. Namely, this algorithm uses only amortized time to update to . The proposed algorithm is based on Weiner’s right-to-left online suffix tree construction [23] that is applied to the reversed input string, and can report all extended net occurrences of repeats in in optimal for each , where denotes the cardinality of a set .
In this paper, we first show that Ukkonen’s left-to-right online suffix tree construction algorithm can also be modified so that it can maintain and update in a total of time with space, and can report in optimal time for each . While this complexity of our Ukkonen-based method is the same as the previous Weiner-based method [8], our method enjoys the following merits:
-
(1)
Our result shows that the arguably complicated suffix-extension data structure of Breslauer and Italiano is not necessary for online computation of string net occurrences with Ukkonen’s algorithm.
- (2)
The first point is a simplification and improvement over Guo et al.’s method [6] based on Ukkonen’s construction. The second point leads us to the following space-efficient alternatives:
-
A sliding-window algorithm of working space that can be maintained in time and can report in optimal time for each sliding window of size in .
-
A CDAWG-based online algorithm of working space that can be maintained in time and can report in optimal time for each prefix of , where is the number of edges in the CDAWG for .
We note that always holds [2], and can be as small as for some highly repetitive strings [20, 19]. Finally, we also discuss that the extended net occurrences of repeats in a string can be fully characterized with the minimal unique substrings (MUSs) [7] in .
2 Preliminaries
Strings
Let be an alphabet of size . An element of is called a character. An element of is called a string. The empty string is the string of length . If holds for strings , and , then , and are called a prefix of , a substring of , and a suffix of , respectively. A prefix (resp. a suffix ) of is called a proper prefix (resp. a proper suffix) of if (resp. ). For a string , denotes the length of . For a string and an integer with , denotes the th character of . For a string and integers with , denotes the substring of starting at position and ending at position . For strings and , we say occurs in if holds for some . Also, if such exist, we denote by the occurrence of in . Also, we denote by the set of occurrences of in , i.e., . For any set , we denote by the cardinality of . For convenience, we assume that the empty string occurs times at the boundaries of consecutive characters in , and denote these inner occurrences by for . We also assume that occurs before the first character and after the last character of . Thus we have . A string is said to be unique in if . Also, is said to be quasi-unique in if . Further, is said to be repeating in if . We denote by (resp. ) the longest repeating suffix (resp. prefix) of . We denote by (resp. ) the shortest quasi-unique suffix (resp. prefix) of .
Maximal repeats and minimal unique substrings
A repeating substring of a string is also called a repeat in . A repeat in is said to be left-branching in if there are at least two distinct characters such that and . Symmetrically, a repeat is said to be right-branching in if there are at least two distinct characters such that and . A repeat of is said to be left-maximal in if is a left-branching repeat of or is a prefix of , and is said to be right-maximal in if is a right-branching repeat in or is a suffix of . A repeat of is said to be maximal in if is both a left-maximal repeat and a right-maximal repeat in . Let , , , , and denote the sets of left-branching, right-branching, left-maximal, right-maximal, and maximal repeats in , respectively. Note that , , and hold. A unique substring of a string is said to be a minimal unique substring (MUS) of if each of the substrings and is a repeat in . By definition, no MUS can be completely contained in another MUS in the string, and thus, there are at most MUSs in any string of length . Let denote the set of occurrences of all MUSs in .
In what follows, we fix a string of length arbitrarily.
(Extended) net occurrences
For a repeating substring of , its occurrence in is said to be a net occurrence of if is unique in , and is unique in . Let be the set of net occurrences in . We call a net unique substring (NUS) if . Let be the set of net unique substrings in . Then, we call the occurrence of NUS the extended net occurrence of the repeat . Let be the set of extended net occurrences in . Clearly, there is a one-to-one correspondence between and , i.e., iff . Note that holds.
Data structures
The suffix tree [23] of a string is a compacted trie that represents all suffixes of . More formally, the suffix tree of is a rooted tree such that (1) each edge is labeled by a non-empty substring of , (2) the labels of the out-edges of the same node begin with distinct characters, and (3) every suffix of is represented by a path from the root. If the path from the root to a node spells out the substring of , then we say that the node represents . By representing each edge label with a pair of positions such that , the suffix tree can be stored in space. For convenience, we identify each node with the string that the node represents. If is a node of the suffix tree with and , then the suffix link of the node points to the node .
There are two versions of suffix trees, implicit suffix trees [22] (a.k.a. Ukkonen trees) and explicit suffix trees [23] (a.k.a Weiner trees). In the implicit suffix tree of string , each repeating suffix of that has a unique right-extension (namely, and for any ) is represented on an edge. On the other hand, each such repeating suffix is represented by a non-branching internal node in the explicit suffix tree of . Let and denote the implicit suffix tree and the explicit suffix tree of string , respectively. The internal nodes of represent the right-branching repeats of , while the internal nodes of represent the right-maximal repeats of . It is thus clear that with a unique end-marker that does not occur in . Due to the nature of left-to-right online string processing, we will use the implicit suffix trees in our algorithms.
The compact directed acyclic word graph (CDAWG) [2] of a string is the smallest edge-labeled DAG that represents all substrings of . There are two versions of CDAWGs as well, implicit CDAWGs [9] and explicit CDAWGs [2]. The implicit CDAWG of string , denoted , is the edge-labeled DAG that is obtained by merging all isomorphic subtrees of the implicit suffix tree which are connected by suffix links. On the other hand, the explicit CDAWG of , denoted , is the edge-labeled DAG that is obtained by merging all isomorphic subtrees of the explicit suffix tree which are connected by suffix links. The internal nodes of have a one-to-one correspondence with the left-maximal and right-branching repeats in , while the internal nodes of have a one-to-one correspondence with the maximal repeats in . More precisely, the longest string represented by an internal node in is a left-maximal and right-branching repeat in , and the longest string represented by an internal node in is a maximal repeat in . Thus, as in the case of suffix trees, holds with a unique end-marker .
The length of a path in the implicit/explicit CDAWG is the total length of the labels of the edges in the path. An in-edge of a node in the implicit/explicit CDAWG is said to be a primary edge if it belongs to the longest path from the source to .
Due to the nature of left-to-right online string processing, we will use the implicit CDAWGs in our algorithm. Let and denote the number of edges of and , respectively. The following lemma guarantees that the worst-case space complexity of our implicit-CDAWG based algorithm is linear in the size of explicit CDAWGs:
Lemma 1.
For any string , .
Proof.
Let and be the sets of nodes of and , respectively. We identify each node of and of with the longest string that the node represents. Then, and . Since , we have that . Let , which implies that and . Let and be the nodes that represent in and , respectively. The out-degree of node in , as well as the out-degree of in , are equal to the number of right-extensions such that . Thus .
3 Changes in net occurrences for online string
In this section, we show how the net unique substrings, equivalently the extended net occurrences in a string can change when a character is appended. We will implicitly use the following fact throughout this section.
Fact 2.
For any strings , and character , holds.
Thus, if is repeating in , then it must be repeating in . Further, if is unique in and is not a suffix of , then must be unique in .
Next lemma characterizes net unique substrings to be deleted when a character is appended to string .
Lemma 3.
Suppose that is a net unique substring in where and . Then, (i) and or (ii) and hold.
Proof.
Since , is unique in , is unique in , and is repeating in . Also, since , or becomes repeating in . See also Figure 2. Note that and cannot be repeating in , because if we assume they become repeating in simultaneously, then both and are suffixes of and this implies , which contradicts that is unique in .
If is repeating in , then is a suffix of . Since is unique in , holds. Also, since is repeating in , holds. These imply that .
If is repeating in , then is a suffix of . Since is unique in , holds. Now let be the preceding character of the suffix of . If we assume that is repeating in , then the other occurrence matches the occurrence of since is unique in . This implies that , which contradicts is unique in . Thus, .
Lemma 4.
Suppose that is a net unique substring in such that is not a suffix of where and . Then, and hold.
Proof.
Since , is unique in , is unique in , and is repeating in . Also, since , is unique in . Namely, occurs in as a suffix and . See also Figure 3. Since is unique in , the preceding character of is not equal to , and thus, is unique in . Therefore, holds.
Lemma 5.
Suppose that is a net unique substring in such that is a suffix of where and . Then, either (i) or (ii) and where .
Proof.
Since , is unique in , is unique in , and is repeating in . There are two cases with respect to the number of occurrences of in . See also Figure 4.
If is repeating in , since is unique in . If is unique in , then appending causes to be repeating in . This implies that , i.e., . Also, since is unique in , holds. Thus .
As for the differences between and , the three following lemmas also hold by symmetry:
Lemma 6.
Suppose that is a net unique substring in where and . Then, (i) and or (ii) and hold.
Lemma 7.
Suppose that is a net unique substring in such that is not a prefix of where and . Then, and hold.
Lemma 8.
Suppose that is a net unique substring in such that is a prefix of where and . Then, either (i) or (ii) and where .
4 Algorithms
In this section, we present our online/sliding algorithms for computing extended net occurrences of repeats for a given string.
4.1 Online algorithm based on implicit suffix trees
By Lemmas 3, 4 and 5, we can compute from with Algorithm 1 since iff . We encode each element by a pair so that can be stored in space. Note that while can be . See lines 9–15 of Algorithm 1. The size of decreases by if we enter line 12, however, the size increases by at line 14. Thus .
From Algorithm 1, we obtain the following theorem:
Theorem 9.
Given string of length , , and character , we can compute in time using space if each of the following operations can be executed in time within space:
-
1.
Determine if .
-
2.
Find the non-suffix occurrence of in when .
-
3.
Determine if .
-
4.
Find the non-suffix occurrence of in when .
-
5.
Compute the lengths of and .
-
6.
Compute and determine if .
-
7.
Determine if for given pair .
-
8.
Insert an element into for given pair .
-
9.
Delete an element from if for given pair .
Proof.
Look at Algorithm 1. Line 2 uses operation 1, Line 3 uses operation 2, Lines 4 and 11 use operation 7, Lines 5 and 12 use operation 9, Line 8 uses operation 3, Line 9 uses operation 4, Lines 14, 18, and 20 use operation 8, Line 17 uses operation 5, and Line 19 uses operation 6. Thus, Algorithm 1 consists of the above nine operations and basic arithmetic operations. Note that the correctness of Theorem 9 does not depend on the data structure used. The next lemma holds if we utilize the implicit suffix tree by Ukkonen [22].
Proof.
We employ an implicit suffix tree [22] of online string enhanced with the active point, which represents , and the secondary active point, which represents as in [16]. According to [16], such an enhanced implicit suffix tree can support operations 1–6 in time. For the readers of this paper, we briefly explain how to perform those operations efficiently below:
-
Operation 5 is obvious since we maintain the active point for every step.
-
Operation 3 can be easily done in constant time by checking whether the active point locates on an edge towards a leaf or not.
-
Operation 1 can be done in constant time by using the (secondary) active points (due to Lemma 1 of [16]).
-
Operations 2 and 4 can be done by looking at the leaves under the (secondary) active points. For instance, look at the implicit suffix tree depicted in Figure 1. The secondary active point (the gray star), which represents the shortest quasi-unique suffix , is on an edge towards a leaf. Further, the leaf under the secondary active point represents the suffix of starting at position . Thus, occurs at position , which is the non-suffix occurrence of . Similarly, the non-suffix occurrence of the longest repeating suffix is position since the leaf under the active point (the white star) represents the suffix of starting at position .
-
Value defined in operation 6 can be easily maintained independent of the suffix tree.
As for operations 7–9, we implement set as a set of occurrences where each element is connected to the corresponding locus of the suffix tree. Since is unique in , the locus of is either the leaf corresponding to unique suffix or on the edge towards the leaf. Thus we can perform operations 7–9 in constant time via the leaves of the suffix tree, for given , which represents some unique substring of .
Finally, the data structure can be maintained in amortized time: Basically the amortized analysis is due to [22]. The secondary active point, which was originally proposed in [16], can also be maintained in a similar manner to the active point, and thus the amortized analysis for the secondary active point is almost the same as that for the active point in [22] (see [16] for the complete proof).
By wrapping up the above discussions, we obtain the following theorem:
Theorem 11.
We can compute the set of extended net occurrences of string of length given in an online manner in a total of time using space.
4.2 Sliding-window algorithm based on implicit suffix trees
By applying symmetric arguments of Theorem 9, we can design a sliding-window algorithm.
Lemma 12.
There exists a data structure of size that supports all nine operations of Theorem 9 in addition to their symmetric operations listed below in constant time.
-
1.
Determine if .
-
2.
Find the non-prefix occurrence of in when .
-
3.
Determine if .
-
4.
Find the non-prefix occurrence of in when .
-
5.
Compute the lengths of and .
-
6.
Compute and determine if where .
The data structure can be updated to either or in amortized time where is a character.
Proof.
The sliding suffix tree data structure of [16] supports all the operations in amortized time using space.
In case we perform the deletion of the leftmost character and the addition of the rightmost character simultaneously, then our algorithm works for a sliding-window of fixed size . On the other hand, our scheme is also applicable to a sliding-window of variable size. Thus we have the following:
Theorem 13.
We can maintain the set of extended net occurrences for a sliding window over string of length in a total of time using working space where is the maximum size of the window.
4.3 Online algorithm based on implicit CDAWGs
The next lemma is an adaptation of Lemma 10 which uses implicit CDAWGs in place of implicit suffix trees:
Lemma 14.
Proof.
Since the online implicit CDAWG construction algorithm [9] is based on Ukkonen’s implicit suffix tree construction, it also maintains the active point that indicates the locus corresponding to . While the locus can correspond to multiple substrings of (as the CDAWG is a DAG), we can retrieve in time by storing, in each node of , the length of the maximal repeat corresponding to . This is because the path that spells out from the source consists only of the primary edges (see [9] for more details). Since edge label is represented by an integer pair such that , we can obtain the non-suffix occurrence of in time (Line 9 in Algorithm 1).
The secondary active point that indicates the locus for can also be maintained on the implicit by adapting the algorithm from [16]. Let be the suffix of that is one-character shorter than . By definition, is the longest suffix of such that . Given the locus for on , one can check in time whether the substrings corresponding to occur at least 3 times, by checking the number of paths from to the sink and checking if the active point is in the subgraph under . Also, by definition, is the longest string represented by the locus . This tells us the length of as well. Thus, we can also maintain the secondary active point in a similar manner to the active point on the implicit CDAWG in amortized time per character, and we can obtain the non-suffix occurrence of in time (Line 3 in Algorithm 1).
The -space requirement follows from Lemma 1.
Theorem 15.
We can compute the set of extended net occurrences of string of length given in an online manner in a total of time using working space.
Proof.
The correctness and the time complexity follows from the above discussions.
5 Relating extended net occurrences and MUSs
In this section, we give a full characterization of the extended net occurrences of repeats in in terms of minimal unique substrings (MUSs) in . See also Figure 5 for illustration.
Lemma 16.
Let be the intervals that represent consecutive MUSs in , namely, there is no element in such that and . Then, is a net occurrence for repeat .
Proof.
Observe that any unique substring of must contain a MUS. Since does not contain a MUS, is a repeat in . Let and . Then, since any of , , and contains a MUS. A consequence of Lemma 16 is that a net occurrence cannot be contained in another net occurrence . This is because a MUS cannot be contained in another MUS. Another consequence is that two consecutive extended net occurrences in are overlapping.
Below we show the reversed version of Lemma 16:
Lemma 17.
Let . Let be consecutive elements in , namely, there is no element in such that and . Then, .
Proof.
By the definition of the extended net occurrences, there is a MUS with that ends at since is unique and is repeating in . Similarly, there is a MUS with that starts at . Here, for the sake of contradiction, we assume . If , there are at least two MUSs within range . If , there are at least two MUSs within range . In both cases, there exists some net occurrence within range by Lemma 16, which contradicts that and are consecutive elements in . Thus holds. Similarly, we can prove , hence .
Consider the case where and , namely is the shortest unique prefix (suPref) of , and is the leftmost extended net occurrence in . Again by the definition of the extended net occurrences, there is a MUS that begins at position . Let where and . Then, . Since is unique and since , there must exist a MUS that ends at position (see Figure 5.) Using a similar argument as above, it can be proven that these two MUSs are the same. Thus . The case where and is symmetric.
Theorem 18.
For any string ,
-
(1)
.
-
(2)
can be obtained from the sorted in optimal time.
-
(3)
can be obtained from the sorted , , and in optimal time.
We also have the following corollary for space-efficient computation of MUSs:
Corollary 19.
We can maintain the set of all MUSs of a string of length given in an online manner in a total of time using working space, where denotes the size of .
Proof.
By combining Theorem 15 and Lemmas 16 and 17, we obtain the corollary except for computation of . This can easily be maintained in the implicit CDAWG as follows. Let be the node of from which the primary edge to the sink stems out. We identify with the maximal repeat that the node represents. If the active point does not exist on this primary edge, then . If the active point lies on this primary edge leading to the sink, then , where is the offset of the active point on the primary edge from the node .
Corollary 20.
For any string , holds.
6 Conclusions and open questions
In this paper we presented how Ukkonen’s left-to-right online suffix tree construction can be used for online computation of string net frequency. Our main contributions are space-efficient algorithms for computing string net occurrences, one works in space in the sliding model for window-length , and the other works in space where denotes the size of the CDAWG of the input string . Both of our methods run in time and can report all (extended) net occurrences of repeats in the current string in output-optimal time. We also showed that computing the sorted list of extended net occurrences of repeats in a string is equivalent to computing the sorted list of minimal unique substrings (MUSs) in .
An intriguing open question is whether one can efficiently compute the extended net occurrences of repeats within space, where denotes the number of equal-character runs in the BWT of the input string. It is known that holds for any string [1]. The R-enum algorithm of Nishimoto and Tabei [17] is able to compute the set of MUSs in time with space, where denotes the machine word size of the word RAM model. However, it is unclear whether their algorithm can output a list of MUSs arranged in the sorted order of the beginning positions within space.
References
- [1] Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, and Mathieu Raffinot. Composite repetition-aware data structures. In CPM 2015, volume 9133 of Lecture Notes in Computer Science, pages 26–39. Springer, 2015. doi:10.1007/978-3-319-19929-0_3.
- [2] Anselm Blumer, Janet Blumer, David Haussler, Ross M. McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578–595, 1987. doi:10.1145/28869.28873.
- [3] Dany Breslauer and Giuseppe F. Italiano. On suffix extensions in suffix trees. Theor. Comput. Sci., 457:27–34, 2012. doi:10.1016/J.TCS.2012.07.018.
- [4] Michael Burrows and David J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, DIGITAL System Research Center, 1994.
- [5] Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel. Exploiting new properties of string net frequency for efficient computation. In CPM 2024, pages 16:1–16:16, 2024. doi:10.4230/LIPICS.CPM.2024.16.
- [6] Peaker Guo, Seeun William Umboh, Anthony Wirth, and Justin Zobel. Online computation of string net frequency. In SPIRE 2024, volume 14899 of Lecture Notes in Computer Science, pages 159–173. Springer, 2024. doi:10.1007/978-3-031-72200-4_12.
- [7] Lucian Ilie and William F. Smyth. Minimum unique substrings and maximum repeats. Fundam. Informaticae, 110(1-4):183–195, 2011. doi:10.3233/FI-2011-536.
- [8] Shunsuke Inenaga. Faster and simpler online computation of string net frequency. CoRR, abs/2410.06837, 2024. doi:10.48550/arXiv.2410.06837.
- [9] Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa, Giancarlo Mauri, and Giulio Pavesi. On-line construction of compact directed acyclic word graphs. Discret. Appl. Math., 146(2):156–179, 2005. doi:10.1016/J.DAM.2004.04.012.
- [10] Shunsuke Inenaga, Takuya Mieno, Hiroki Arimura, Mitsuru Funakoshi, and Yuta Fujishige. Computing minimal absent words and extended bispecial factors with CDAWG space. In IWOCA 2024, volume 14764 of Lecture Notes in Computer Science, pages 327–340. Springer, 2024. doi:10.1007/978-3-031-63021-7_25.
- [11] N. Jesper Larsson. Extended application of suffix trees to data compression. In DCC 1996, pages 190–199. IEEE Computer Society, 1996. doi:10.1109/DCC.1996.488324.
- [12] Laurentius Leonard, Shunsuke Inenaga, Hideo Bannai, and Takuya Mieno. Sliding suffix trees simplified. CoRR, abs/2307.01412, 2023. doi:10.48550/arXiv.2307.01412.
- [13] Yih-Jeng Lin and Ming-Shing Yu. Extracting Chinese frequent strings without dictionary from a Chinese corpus, its applications. J. Inf. Sci. Eng., 17(5):805–824, 2001. URL: http://www.iis.sinica.edu.tw/page/jise/2001/200109_07.html.
- [14] Yih-Jeng Lin and Ming-Shing Yu. The properties and further applications of Chinese frequent strings. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 1, February 2004: Special Issue on Selected Papers from ROCLING XV, pages 113–128, 2004.
- [15] Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, 1993. doi:10.1137/0222058.
- [16] Takuya Mieno, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing minimal unique substrings for a sliding window. Algorithmica, 84(3):670–693, 2022. doi:10.1007/S00453-021-00864-1.
- [17] Takaaki Nishimoto and Yasuo Tabei. R-enum: Enumeration of characteristic substrings in bwt-runs bounded space. In CPM 2021, volume 191 of LIPIcs, pages 21:1–21:21, 2021. doi:10.4230/LIPICS.CPM.2021.21.
- [18] Enno Ohlebusch, Thomas Büchler, and Jannik Olbrich. Faster computation of Chinese frequent strings and their net frequencies. In SPIRE 2024, volume 14899 of Lecture Notes in Computer Science, pages 249–256. Springer, 2024. doi:10.1007/978-3-031-72200-4_19.
- [19] Jakub Radoszewski and Wojciech Rytter. On the structure of compacted subword graphs of Thue-Morse words and their applications. J. Discrete Algorithms, 11:15–24, 2012. doi:10.1016/J.JDA.2011.01.001.
- [20] Wojciech Rytter. The structure of subword graphs and suffix trees of Fibonacci words. Theor. Comput. Sci., 363(2):211–223, 2006. doi:10.1016/J.TCS.2006.07.025.
- [21] Martin Senft. Suffix tree for a sliding window: An overview. In WDS 2005, volume 5, pages 41–46, 2005.
- [22] Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995. doi:10.1007/BF01206331.
- [23] Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1–11. IEEE Computer Society, 1973. doi:10.1109/SWAT.1973.13.