Circular Dictionary Matching Using Extended BWT
Abstract
The dictionary matching problem involves preprocessing a set of strings (patterns) into a data structure that efficiently identifies all occurrences of these patterns within a query string (text). In this work, we investigate a variation of this problem, termed circular dictionary matching, where the patterns are circular, meaning their cyclic shifts are also considered valid patterns. Such patterns naturally occur in areas such as bioinformatics and computational geometry. Based on the extended Burrows-Wheeler Transformation (eBWT), we design a space-efficient solution for this problem. Specifically, we show that a dictionary of circular patterns of total length can be indexed in bits of space and support circular dictionary matching on a query text in time, where represents the size of the underlying alphabet and represents the output size.
Keywords and phrases:
String algorithms, Burrows-Wheeler transformation, suffix trees, succinct data structuresCopyright and License:
2012 ACM Subject Classification:
Theory of computation Pattern matchingFunding:
Supported by the US National Science Foundation (NSF) under Grant Numbers 2137057, 2434261 (R Shah) and 2315822 (S Thankachan).Acknowledgements:
We thank all the anonymous reviewers (and Mano Prakash Parthasarathi) for their valuable feedback, which helped improve this paper. We also thank Travis Gagie for pointing out the independent work by Cotumaccio [25].Editors:
Paolo Ferragina, Travis Gagie, and Gonzalo NavarroSeries and Publisher:
Open Access Series in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Text indexing is a fundamental problem in computer science, where we are given a long string (the text) for preprocessing into a data structure (the index) that supports efficient substring matching. Specifically, when a shorter string (the pattern) is provided as a query, the goal is to find all occurrences of the pattern as a substring within the text. Data structures such as suffix trees and suffix arrays are commonly used for this task [67, 53]. However, a major drawback of suffix trees and suffix arrays is their significant space requirement. To address this issue, compact and space-efficient solutions have been developed [28, 41, 62]. Among these, FM-index [28], Compressed Suffix Arrays [41], and Compressed Suffix Trees [62] are particularly notable due to their historical significance. We refer to [57] for a comprehensive survey on this topic.
Dictionary matching is an orthogonal problem to text indexing. Here, we are given a set of patterns (the dictionary) and our aim is to design a data structure capable of finding all occurrences of these patterns as substrings within a query text. Solutions to this problem have applications in areas such as intrusion detection and bioinformatics, where they are used to identify known DNA or protein sequences in genomic data. The classic solution to this problem is the Aho-Corasick automaton [1], which efficiently matches multiple patterns simultaneously. However, as before, this data structure also suffers from a high space requirement. To address this issue, compact and space-efficient solutions have been proposed. The current best succinct-space result is due to Belazzougui [10], where an xBWT-like technique [27] is applied for the succinct encoding of the Aho-Corasick automaton. We refer the reader to [55, 59, 16] for follow-up work in this direction, including an entropy-compressed solution [45]. For an an alternative solution based on sparse suffix trees, see [42]. A wide range of variations on this problem have been studied, including dynamic dictionary matching [3, 18, 46, 26, 65], online dictionary matching [35, 51, 6], dictionary matching in streaming model [36, 34, 37], dictionary matching with errors or gaps [5, 4, 24, 44, 47], internal dictionary matching [19], dictionary matching under parameterized or order-preserving models [52, 32, 33], 2D dictionary matching [2, 58], etc.
In this work, we investigate another variant of dictionary matching, termed circular dictionary matching [48]. Here the patterns in the given dictionary are circular, meaning their cyclic shifts are also considered valid patterns. For example, the set of cyclic shifts of abcd is , whereas that of abab is . Note that circular patterns arise naturally in applications in bioinformatics and computational geometry. For instance, the genomes of many viruses, such as the herpes simplex virus (HSV-1), exist as circular strings [64]. In computational geometry, polygons are often represented by listing the coordinates of their vertices in clockwise order. The problem of matching a circular pattern, or a collection of circular patterns, in a given text has been extensively studied from an algorithmic perspective [23, 22, 20, 7, 49, 9, 40, 21]. Our objective is to design a solution for the data-structural version of this problem.
Problem 1 (Circular Dictionary Matching [48]).
Given a set of circular patterns of total length on an alphabet , design a data structure (called an index) that supports the following query efficiently:
-
Input: A text over
-
Output: All occurrences of all circular patterns in . Specifically, every substring of that corresponds to a cyclic shift of any pattern in ; a substring can be denoted by the starting and ending position in the text. We use to denote the output size.
Problem 1 is equivalent to standard dictionary matching on an expanded dictionary , which includes all circular patterns in along with their cyclic shifts. Therefore, one approach is to index , which is clearly inefficient, as the sum of the sizes of all patterns in can be quadratic to that of in the worst case. To that end, we presented a succinct space index of space bits and query complexity in our earlier work [48], which we improve upon in this paper.
Theorem 2.
For the circular dictionary matching problem, there exists an index requiring space bits and query time . Here denotes the total length of circular patterns in the dictionary , , denotes the size of the alphabet set, denotes the text (and denotes its length) that comes as a query, and denotes the output size.
Our new solution is based on a structure similar to the FM-index [28], which is built from the extended Burrows-Wheeler Transformation () [54], an extension of the traditional Burrows-Wheeler Transformation (BWT) [15] that works over multiple strings [8, 13, 14, 17, 60]. We remark that the application of the to the space-efficient indexing of circular patterns is not new. For example, in recent work, Boucher et al. [14] showed how to construct an -based index for a collection of strings with full cyclic pattern matching functionality in compressed space. This problem is distinct from the one we address in this work; nonetheless, the techniques employed in this paper are closely similar to those in their work.
Remark.
In work parallel to ours, Cotumaccio [25] independently achieved an index with similar space-time complexity; specifically bits of space and query time.
2 Preliminaries
Let denote the underlying alphabet. For a string , we use to denote its length, to denote its substring when and an empty string when , where denotes concatenation. In addition, and . For an integer , denotes an empty string if , and otherwise, while denotes the concatenation of infinite copies of . The root of a string , is defined as the shortest string , such that for some integer ; we call primitive if . For example, are primitive, whereas is not. The result below follows from Fine and Wilf’s theorem [29] (also see Theorem 1 in [12]).
Lemma 3.
Let and be two distinct primitive strings. Then, the length of the longest common prefix of their infinite repetitions, and , is at most , where denotes the greatest common divisor. This implies that although and are infinite in length, their lexicographic order can be established by comparing a bounded number of characters.
Rank and Select Queries.
For a character , denotes the number of occurrences of in , denotes the location of -th occurrence of in , and . There exist different space-time trade-offs for supporting these operations. For example, a wavelet tree structure of space bits can support all three operations in time [39]. See [11, 38] for faster alternatives.
For our purpose, we use an -bit structure, which requires time for rank and only time for others. The idea is to use the following result (indexible dictionaries) by Raman et al. [61]: a binary string with number of 1s can be represented in bits and support in time, and in time, where if and is an arbitrary number otherwise. Therefore, for each , we maintain the following bit vectors as indexible dictionaries: , where iff . Let be the number of occurrences of in . Then the total space is bits. To enable the operations on , we employ the following connections: , , and .
Range Minimum Queries (RMQ).
Let be an array of numbers, we can design an bit structure that supports the following query [30] in constant time: input is a range and output is position , such that is the smallest element in . For answering RMQ, we do not need to explicitly store .
Tree Topology.
The topology of an ordinal tree can be encoded in linear number of bits and support various tree operations in time [63, 56]. The ones relevant to us are finding the parent of a node, the range of leaves (i.e., the first and last leaves) in the subtree of a node, and the lowest common ancestor (LCA) of two nodes. Here a node is referred to by its pre-order rank.
3 Extended Burrows-Wheeler Transformation and Related Structures
In this section, we formally define the extended BWT, along with data structures analogous to suffix trees and suffix arrays for circular strings, which we refer to as the extended suffix tree and extended suffix array. The definition of extended BWT is first proposed by Mantaci et al. [54]. For related concepts, different terminologies have been used in prior work; for example, the extended suffix array was referred to as the circular suffix array in [48, 43], and as the generalized conjugate array in recent work by Boucher et al. [14]. However, it is straightforward to observe that these structures are equivalent. As we will show, the succinct encoding of the extended suffix array follows naturally from the original ideas of the FM-index. However, the definition of the extended suffix tree, or the use of it to solve pattern matching problems, seem to be new. In the following, we will follow the terminologies in [48, 43] to define the extended BWT, as they align more naturally in defining (more importantly compressing) the extended suffix tree than those in the original paper [54]. Also, we remark that encoding certain components of the extended suffix tree requires non-trivial modifications of known results.
Let be a given set of primitive strings, where and . We call the -th cyclic shift of , where . Without loss of generality, we make an important assumption that no string in is a cyclic shift of another.111As we will see, our goal is to efficiently represent all cyclic shifts of strings in the collection. If two strings are cyclic shifts of each other, they share the same set of shifts. Therefore, it’s sufficient to store only one representative from each group of cyclically equivalent strings. Next, we define the following sets:
Note that is a collection of strings, which are infinite in length, but pairwise distinct; therefore, their lexicographic order is well-defined. Moreover, by Fine and Wilf’s theorem (see Lemma 3), the length of the longest common prefix between any two strings in is at most .
The Extended Suffix Tree (denoted by ) is a compacted trie of all strings in . It consists of leaves, say in the left-to-right order, which are enumerated according to the lexicographic order of the infinite strings in . The number of internal nodes is at most . For any node , denotes the concatenation of edge labels on the path from the root to and denotes the length of . Note that for all internal nodes from Lemma 3. However, each edge connecting to a leaf has a label with an infinite length. This is unlike the standard suffix tree. Yet, the tree topology is well-defined, because the lexicographic order of strings in is well defined222Note that this remains true even when each infinite string in is treated as a finite string by considering only its first characters.. We remark that the idea of having leaf edges with unspecified lengths is not new; e.g., see Ukkonen’s suffix tree construction algorithm [66]. The suffix range of a string is the maximal range where is a prefix of for all , and is NIL if no such exists. The range of leaves of a node is the suffix range of the string .
The Extended Suffix Array is an array of pairs such that , where , and the Extended BWT is a string, where is the last character of . We remark that the terminology used in the original definition of in [54] is slightly different (i.e., omega order), but there is no technical difference. Note that and take bits, whereas takes only bits.
As an example, consider the set , where and . Then consists of leaves, such that
-
1.
-
2.
-
3.
-
4.
-
5.
-
6.
-
7.
-
8.
-
9.
The corresponding is and is .
3.1 Succinct Representation of
We maintain in bits to support and operations efficiently as described in Section 2. Similar to the FM-index, we now implement LF mapping and backward search [28].
Last-to-Front (LF) Mapping.
Define , where deleting the first character of gives . The function can also be described in terms of as . For any character , . For all , this information can be stored explicitly in bits, and the operation takes only time. Therefore, the overall time for operation is constant.
Backward Search.
Given the suffix range of a string , we can compute the suffix range of for any in time as follows. First compute the following
If , then return as the suffix range of . Otherwise, conclude that the suffix range of does not exist. The time complexity comes from the time for rank operations.
To encode , we replace it with a sparse , where we store values only at some sampled positions. Specifically, we store an entry iff , where be a parameter. This reduces the number of values stored to , where is the number of indexed patterns; space is bits per value. Suppose , then , , etc., which means we can decode (if not explicitly stored) by iteratively applying (say times) until we arrive at a positions where is stored. We then obtain . The time complexity is and . By fixing , we obtain the following result.
Lemma 4.
There exists an -bit data structure that returns for any given in time .
3.2 Succinct Representation of
We encode and maintain the topology of in bits for supporting the relevant tree operations in time. To encode the values of , we modify the techniques introduced by Sadakane [62].
For all , define , where is the length of the longest common prefix of and the string in the set that comes next in the lexicographic order (say ). Note that longest common prefix of and shares at least characters. So, . By adding on both sides and rearranging, we have
Letting , we have
This means is non-decreasing and its range is . We store explicitly in bits and use unary encoding for the rest. Specifically, for each we store a binary string with time rank/select supported [61]. To find , we simply count the number of 0s before -th 1 and add . Subtracting from this value gives . Space required for a fixed is , where . Therefore, space over all ’s is bits.
We now present the last component for encoding values. Define array as follows. Suppose , then . We do not store explicitly, but an RMQ structure over it in bits. To compute , find , the range of in the subtree of and the position corresponding to the minimum in using an RMQ. Then obtain and . The time complexity (asymptotically) is equal to that of an query.
Lemma 5.
By associating an bit structure with , we can find of any given node in time .
This concludes our discussion of the main data structure. In the following section, we demonstrate how it can be used to solve the circular dictionary matching problem.
4 A Succinct Index for Circular Dictionary Matching
Traditional “non-circular” dictionary matching can be solved efficiently with a generalized suffix tree of the patterns, by successively finding the locus of each suffix of the query text within the suffix tree, and then reporting the patterns that appear as the prefix of the corresponding suffix. For circular patterns, we show that this can be done analogously with the extended suffix tree. Yet, there are some technical challenges. First, as patterns of different lengths can be represented by the same leaf, we need some care to verify if a pattern actually appears in a certain text position. Another, more difficult, challenge is to obtain the loci of the suffixes; a direct adaptation of the “non-circular” dictionary matching approach cannot bound the running time as desired. In the following, we will explain how such challenges can be solved.
Given a set of circular patterns to index. We first make a critical step of replacing each pattern with its lexicographically smallest cyclic shift. Denote the resulting set by . We then collect the roots of all ’s and call it . The first step ensures that is invariant of the cyclic shifts of ’s. Let and . Note that and .
4.1 The Data Structure
We construct and maintain the structures in Lemma 4 and Lemma 5 over the strings in . Additional components are below.
-
1.
We encode each pattern as a pair , where is the root of , equivalently . For each , we maintain a list of patterns (in encoded form) with being their root. Each list is sorted in the ascending order of the pattern lengths. Total space is bits. Therefore, given any , we can list all patterns with root and length in optimal time.
-
2.
We want to support the query , which reports all patterns (i.e., in encoded form), where and for some . Define an array , where is the length of the shortest pattern in , where . We do not store this array, but an RMQ structure over it in bits. With that, we support the query using the following standard procedure.
-
When , we decode , where and obtain the output from .
-
When , we find the position of the minimum element in using RMQ, and then make the query . If it returns NIL (i.e., output is empty), we conclude that is also NIL. Otherwise, we continue our search for more answers in the remaining parts of the array, specifically in and using queries and , recursively.
Let be the output size. Then it can be observed that the original query is split into subqueries. Therefore, the time complexity is .
-
-
3.
We mark a node in if it is the highest node (i.e., closest to root) for some , such that or a cyclic shift of it is a prefix of . With each node, we associate a bit, indicating if it is marked or not. Next, we want to support the following query: given an arbitrary node , list its marked ancestors. One could easily accomplish this by first finding all of its ancestors (via finding parent nodes iteratively, starting ) and then extracting those that are marked. But the time taken could be in the worst case. To bound the time in terms of the number of marked ancestors, we store the lowest marked ancestor (LMA) of the following sampled nodes explicitly: for all . This scheme requires extra bits and guarantees that any path towards the root with nodes contains a sampled node. Therefore, to list all the marked ancestors of , we traverse the path from to root as before with the difference that when we are at a sampled node, we jump to its LMA (i.e., skip all nodes in between) and continue. All marked ancestors will be visited and the total number of nodes visited (and total time) is bounded by times the number of marked ancestors.
Total space is bits.
4.2 Query Algorithm
We report all , where and (a cyclic shift of) occurs at position in , using the following steps.
-
1.
Find the maximum , such that there exists a node where is a prefix of and is the highest such node. Also, let be the range of leaves of .
-
2.
Report , if a cyclic shift of is a prefix of and its length is .
Note that the first step corresponds to finding the loci of in the extended suffix tree, for each . For traditional suffix tree approach, we find the loci in ascending order of . Here, we do so in the opposite manner, in descending order of . Intuitively, this change allows us to replace the “downward” traversal of tree edges in the extended suffix tree to “upward” traversals, where the latter can be implemented more efficiently with our auxiliary data structures. We now show how to implement these two steps efficiently.
4.2.1 Details of Step 1
Initialize , , and fix . Then we compute in the descending order of using backward search as follows. We have two cases.
-
1.
If has an occurrence in , then find the smallest number and the largest number in , such that . Then obtain and . The time complexity in this case is .
-
2.
If has no occurrence in , then we find the lowest ancestor of , say , such that has an occurrence in , where is the range of leaves below . Then find the smallest number and the largest number in , such that . Then obtain and . To find , we perform a linear search on the path from to root until we find a node satisfying the required condition. This requires rank queries, where is the number of nodes on the path from to . Therefore, the time for a particular value of is . When considering all values of , we obtain the following total time complexity.333An alternative way to find is to perform a binary search along the path from to the root. This method requires a logarithmic number of rank queries and takes time for a particular value of . However, it results in a higher overall time complexity compared to what we described above.
In summary, the time for step 1 for all values of combined is .
4.2.2 Details of Step 2
Find , where is the lowest marked ancestor of for all and is the number of marked ancestors of . Let be the range of leaves in the subtree of . We make the following queries and collect their answers.
-
, which returns the ’s corresponding to as the final output, where the leaves corresponding to them are below .
-
For , we perform the following two queries: and , which return the ’s corresponding to as the final output, where the leaves corresponding to them are below , but not below (this avoids reporting the same answer multiple times). For a fixed , the output size is at least from the definition of marked nodes.
The total time of Step 2 is , which thereby establishes the overall time complexity. This completes the proof of Theorem 2.
5 Discussion and Open Problems
We present a succinct index for the circular dictionary matching problem that supports queries efficiently in time. In our earlier work [48], we achieved a similar result under the assumption that all patterns are approximately the same length (i.e., ). There, this assumption was necessary for the efficient encoding of values of nodes in a suffix tree-like structure. Although we later managed to remove this restriction, it came at the cost of a higher query time of . The improved result in this paper builds on a novel use of the , combined with a careful adaptation of Sadakane’s technique for encoding values of nodes in the , as described in section 3.2.
We conclude with a list of problems that remain open for future research.
-
1.
In contrast to the best known succinct solution for the standard dictionary matching problem [10], which achieves a query time of , our solution has a query complexity of in the circular setting. Even the recent alternative solution by Cotumaccio [25] has the same query time. This raises an important and natural question: Can the query time be further improved while maintaining the same space bound? Although this does not seem immediate, we remark that a trade-off allowing faster query time might be possible by adapting techniques from [41]; for example, using bits of space and achieving query time, where is an arbitrarily small constant.
-
2.
Another important question is the efficient construction of our data structure. While the construction of the extended Burrows-Wheeler Transform is already a well-studied problem [8, 13] (also see [43]), the remaining challenge lies in designing the construction of the additional structures. While linear-space construction algorithms that require near-linear time appear achievable, achieving space efficiency (i.e., using small working space) and optimizing the polylogarithmic factors in time can be challenging. Additionally, introducing engineering solutions, possibly through heuristics, could lead to a practical approach, which would benefit from experimental analysis.
-
3.
A repetition-aware index for circular dictionary matching would be desirable. While it is relatively straightforward to reduce the space of the extended Burrows-Wheeler Transform by applying run-length compression [14], encoding the remaining components (specifically, the RMQ structures in Section 4.1) in a repetition-aware manner remains challenging. Addressing this may require a careful adaptation of the techniques from [31] or from very recent work in [50].
References
- [1] Alfred V. Aho and Margaret J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM, 18(6):333–340, 1975. doi:10.1145/360825.360855.
- [2] Amihood Amir and Martin Farach. Two-Dimensional Dictionary Matching. Inf. Process. Lett., 44(5):233–239, 1992. doi:10.1016/0020-0190(92)90206-B.
- [3] Amihood Amir, Martin Farach, Zvi Galil, Raffaele Giancarlo, and Kunsoo Park. Dynamic Dictionary Matching. J. Comput. Syst. Sci., 49(2):208–222, 1994. doi:10.1016/S0022-0000(05)80047-9.
- [4] Amihood Amir, Dmitry Keselman, Gad M. Landau, Moshe Lewenstein, Noa Lewenstein, and Michael Rodeh. Text Indexing and Dictionary Matching with One Error. J. Algorithms, 37(2):309–325, 2000. doi:10.1006/JAGM.2000.1104.
- [5] Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the Gap! – Online Dictionary Matching with One Gap. Algorithmica, 81(6):2123–2157, 2019. doi:10.1007/S00453-018-0526-2.
- [6] Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Online Recognition of Dictionary with One Gap. Inf. Comput., 275:104633, 2020. doi:10.1016/J.IC.2020.104633.
- [7] Tanver Athar, Carl Barton, Widmer Bland, Jia Gao, Costas S. Iliopoulos, Chang Liu, and Solon P. Pissis. Fast Circular Dictionary-Matching Algorithm. Math. Struct. Comput. Sci., 27(2):143–156, 2017. doi:10.1017/S0960129515000134.
- [8] Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Piatkowski. Constructing and Indexing the Bijective and Extended Burrows-Wheeler Transform. Inf. Comput., 297:105153, 2024. doi:10.1016/J.IC.2024.105153.
- [9] Carl Barton, Costas S. Iliopoulos, and Solon P. Pissis. Fast Algorithms for Approximate Circular String Matching. Algorithms Mol. Biol., 9:9, 2014. doi:10.1186/1748-7188-9-9.
- [10] Djamal Belazzougui. Succinct Dictionary Matching with No Slowdown. In Amihood Amir and Laxmi Parida, editors, Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, volume 6129 of Lecture Notes in Computer Science, pages 88–100. Springer, 2010. doi:10.1007/978-3-642-13509-5_9.
- [11] Djamal Belazzougui and Gonzalo Navarro. Optimal Lower and Upper Bounds for Representing Sequences. ACM Trans. Algorithms, 11(4):31:1–31:21, 2015. doi:10.1145/2629339.
- [12] Silvia Bonomo, Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Sorting conjugates and suffixes of words in a multiset. Int. J. Found. Comput. Sci., 25(8):1161, 2014. doi:10.1142/S0129054114400309.
- [13] Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, and Marinella Sciortino. Computing the Original eBWT Faster, Simpler, and with Less Memory. In Thierry Lecroq and Hélène Touzet, editors, String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4-6, 2021, Proceedings, volume 12944 of Lecture Notes in Computer Science, pages 129–142. Springer, 2021. doi:10.1007/978-3-030-86692-1_11.
- [14] Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, and Marinella Sciortino. r-Indexing the eBWT. Inf. Comput., 298:105155, 2024. doi:10.1016/J.IC.2024.105155.
- [15] Michael Burrows and David J. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. SRC Research Report, 124, 1994.
- [16] Bastien Cazaux and Eric Rivals. Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, volume 128 of LIPIcs, pages 24:1–24:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.CPM.2019.24.
- [17] Davide Cenzato and Zsuzsanna Lipták. A Theoretical and Experimental Analysis of BWT Variants for String Collections. In Hideo Bannai and Jan Holub, editors, 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic, volume 223 of LIPIcs, pages 25:1–25:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.CPM.2022.25.
- [18] Ho-Leung Chan, Wing-Kai Hon, Tak Wah Lam, and Kunihiko Sadakane. Dynamic Dictionary Matching and Compressed Suffix Trees. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, Vancouver, British Columbia, Canada, January 23-25, 2005, pages 13–22. SIAM, 2005. URL: http://dl.acm.org/citation.cfm?id=1070432.1070436.
- [19] Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Internal Dictionary Matching. Algorithmica, 83(7):2142–2169, 2021. doi:10.1007/S00453-021-00821-Y.
- [20] Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszynski, Tomasz Walen, and Wiktor Zuba. Circular Pattern Matching with Mismatches. In Leszek Antoni Gasieniec, Jesper Jansson, and Christos Levcopoulos, editors, Fundamentals of Computation Theory - 22nd International Symposium, FCT 2019, Copenhagen, Denmark, August 12-14, 2019, Proceedings, volume 11651 of Lecture Notes in Computer Science, pages 213–228. Springer, 2019. doi:10.1007/978-3-030-25027-0_15.
- [21] Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszynski, Tomasz Walen, and Wiktor Zuba. Circular Pattern Matching with Mismatches. J. Comput. Syst. Sci., 115:73–85, 2021. doi:10.1016/J.JCSS.2020.07.003.
- [22] Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Walen, and Wiktor Zuba. Approximate Circular Pattern Matching. In Shiri Chechik, Gonzalo Navarro, Eva Rotenberg, and Grzegorz Herman, editors, 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, volume 244 of LIPIcs, pages 35:1–35:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ESA.2022.35.
- [23] Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen, and Wiktor Zuba. Approximate Circular Pattern Matching Under Edit Distance. In Olaf Beyersdorff, Mamadou Moustapha Kanté, Orna Kupferman, and Daniel Lokshtanov, editors, 41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024, March 12-14, 2024, Clermont-Ferrand, France, volume 289 of LIPIcs, pages 24:1–24:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.STACS.2024.24.
- [24] Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary Matching and Indexing with Errors and Don’t Cares. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 91–100. ACM, 2004. doi:10.1145/1007352.1007374.
- [25] Nicola Cotumaccio. Improved circular dictionary matching. In Paola Bonizzoni and Veli Mäkinen, editors, 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025, June 17-19, 2025, Milan, Italy, volume 331 of LIPIcs, pages 18:1–18:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPICS.CPM.2025.18.
- [26] Guy Feigenblat, Ely Porat, and Ariel Shiftan. An Improved Query Time for Succinct Dynamic Dictionary Matching. In Alexander S. Kulikov, Sergei O. Kuznetsov, and Pavel A. Pevzner, editors, Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, volume 8486 of Lecture Notes in Computer Science, pages 120–129. Springer, 2014. doi:10.1007/978-3-319-07566-2_13.
- [27] Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, and S. Muthukrishnan. Structuring Labeled Trees for Optimal Succinctness, and Beyond. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA, Proceedings, pages 184–196. IEEE Computer Society, 2005. doi:10.1109/SFCS.2005.69.
- [28] Paolo Ferragina and Giovanni Manzini. Indexing Compressed Text. J. ACM, 52(4):552–581, 2005. doi:10.1145/1082036.1082039.
- [29] Nathan J Fine and Herbert S Wilf. Uniqueness Theorems for Periodic Functions. Proceedings of the American Mathematical Society, 16(1):109–114, 1965.
- [30] Johannes Fischer and Volker Heun. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM J. Comput., 40(2):465–492, 2011. doi:10.1137/090779759.
- [31] Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM, 67(1):2:1–2:54, 2020. doi:10.1145/3375890.
- [32] Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. A Framework for Designing Space-Efficient Dictionaries for Parameterized and Order-Preserving Matching. Theor. Comput. Sci., 854:52–62, 2021. doi:10.1016/J.TCS.2020.11.036.
- [33] Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 397–407. SIAM, 2017. doi:10.1137/1.9781611974782.25.
- [34] Pawel Gawrychowski and Tatiana Starikovskaya. Streaming Dictionary Matching with Mismatches. Algorithmica, 84(4):896–916, 2022. doi:10.1007/S00453-021-00876-X.
- [35] Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Dynamic Dictionary Matching in the Online Model. In Zachary Friggstad, Jörg-Rüdiger Sack, and Mohammad R. Salavatipour, editors, Algorithms and Data Structures - 16th International Symposium, WADS 2019, Edmonton, AB, Canada, August 5-7, 2019, Proceedings, volume 11646 of Lecture Notes in Computer Science, pages 409–422. Springer, 2019. doi:10.1007/978-3-030-24766-9_30.
- [36] Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 65:1–65:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.ICALP.2018.65.
- [37] Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In Kirk Pruhs and Christian Sohler, editors, 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria, volume 87 of LIPIcs, pages 41:1–41:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.ESA.2017.41.
- [38] Alexander Golynski, J. Ian Munro, and S. Srinivasa Rao. Rank/Select Operations on Large Alphabets: A Tool for Text Indexing. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 368–373. ACM Press, 2006. URL: http://dl.acm.org/citation.cfm?id=1109557.1109599.
- [39] Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-Order Entropy-Compressed Text Indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA, pages 841–850. ACM/SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250.
- [40] Roberto Grossi, Costas S. Iliopoulos, Robert Mercas, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, and Fatima Vayani. Circular Sequence Comparison: Algorithms and Applications. Algorithms Mol. Biol., 11:12, 2016. doi:10.1186/S13015-016-0076-6.
- [41] Roberto Grossi and Jeffrey Scott Vitter. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput., 35(2):378–407, 2005. doi:10.1137/S0097539702402354.
- [42] Wing-Kai Hon, Tsung-Han Ku, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, Sharma V. Thankachan, and Jeffrey Scott Vitter. Compressing Dictionary Matching Index via Sparsification Technique. Algorithmica, 72(2):515–538, 2015. doi:10.1007/S00453-013-9863-3.
- [43] Wing-Kai Hon, Tsung-Han Ku, Chen-Hua Lu, Rahul Shah, and Sharma V. Thankachan. Efficient Algorithm for Circular Burrows-Wheeler Transform. In Juha Kärkkäinen and Jens Stoye, editors, Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Helsinki, Finland, July 3-5, 2012. Proceedings, volume 7354 of Lecture Notes in Computer Science, pages 257–268. Springer, 2012. doi:10.1007/978-3-642-31265-6_21.
- [44] Wing-Kai Hon, Tsung-Han Ku, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Compressed Dictionary Matching with One Error. In James A. Storer and Michael W. Marcellin, editors, 2011 Data Compression Conference (DCC 2011), 29-31 March 2011, Snowbird, UT, USA, pages 113–122. IEEE Computer Society, 2011. doi:10.1109/DCC.2011.18.
- [45] Wing-Kai Hon, Tsung-Han Ku, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Faster Compressed Dictionary Matching. Theor. Comput. Sci., 475:113–119, 2013. doi:10.1016/J.TCS.2012.10.050.
- [46] Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, and Jeffrey Scott Vitter. Succinct Index for Dynamic Dictionary Matching. In Yingfei Dong, Ding-Zhu Du, and Oscar H. Ibarra, editors, Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009. Proceedings, volume 5878 of Lecture Notes in Computer Science, pages 1034–1043. Springer, 2009. doi:10.1007/978-3-642-10631-6_104.
- [47] Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Sharma V. Thankachan, Hing-Fung Ting, and Yilin Yang. Dictionary Matching with a Bounded Gap in Pattern or in Text. Algorithmica, 80(2):698–713, 2018. doi:10.1007/S00453-017-0288-2.
- [48] Wing-Kai Hon, Chen-Hua Lu, Rahul Shah, and Sharma V. Thankachan. Succinct Indexes for Circular Patterns. In Takao Asano, Shin-Ichi Nakano, Yoshio Okamoto, and Osamu Watanabe, editors, Algorithms and Computation - 22nd International Symposium, ISAAC 2011, Yokohama, Japan, December 5-8, 2011. Proceedings, volume 7074 of Lecture Notes in Computer Science, pages 673–682. Springer, 2011. doi:10.1007/978-3-642-25591-5_69.
- [49] Costas S. Iliopoulos, Solon P. Pissis, and M. Sohel Rahman. Searching and Indexing Circular Patterns. In Mourad Elloumi, editor, Algorithms for Next-Generation Sequencing Data, Techniques, Approaches, and Applications, pages 77–90. Springer, 2017. doi:10.1007/978-3-319-59826-0_3.
- [50] Dominik Kempa and Tomasz Kociumaka. Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1877–1886. IEEE, 2023. doi:10.1109/FOCS57990.2023.00114.
- [51] Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct Online Dictionary Matching with Improved Worst-Case Guarantees. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 6:1–6:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.CPM.2016.6.
- [52] Avivit Levy and B. Riva Shalom. Online Parameterized Dictionary Matching with One Gap. Theor. Comput. Sci., 845:208–229, 2020. doi:10.1016/J.TCS.2020.09.016.
- [53] Udi Manber and Eugene W. Myers. Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5):935–948, 1993. doi:10.1137/0222058.
- [54] Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. An Extension of the Burrows-Wheeler Transform. Theor. Comput. Sci., 387(3):298–312, 2007. doi:10.1016/J.TCS.2007.07.014.
- [55] Giovanni Manzini. XBWT Tricks. In Shunsuke Inenaga, Kunihiko Sadakane, and Tetsuya Sakai, editors, String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan, October 18-20, 2016, Proceedings, volume 9954 of Lecture Notes in Computer Science, pages 80–92, 2016. doi:10.1007/978-3-319-46049-9_8.
- [56] J. Ian Munro and Venkatesh Raman. Succinct Representation of Balanced Parentheses and Static Trees. SIAM J. Comput., 31(3):762–776, 2001. doi:10.1137/S0097539799364092.
- [57] Gonzalo Navarro and Veli Mäkinen. Compressed Full-Text Indexes. ACM Comput. Surv., 39(1):2, 2007. doi:10.1145/1216370.1216372.
- [58] Shoshana Neuburger and Dina Sokol. Succinct 2D Dictionary Matching. Algorithmica, 65(3):662–684, 2013. doi:10.1007/S00453-012-9615-9.
- [59] Enno Ohlebusch, Stefan Stauß, and Uwe Baier. Trickier XBWT Tricks. In Travis Gagie, Alistair Moffat, Gonzalo Navarro, and Ernesto Cuadros-Vargas, editors, String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Lima, Peru, October 9-11, 2018, Proceedings, volume 11147 of Lecture Notes in Computer Science, pages 325–333. Springer, 2018. doi:10.1007/978-3-030-00479-8_26.
- [60] Eric M. Osterkamp and Dominik Köppl. Extending the Parameterized Burrows-Wheeler Transform. In Ali Bilgin, James E. Fowler, Joan Serra-Sagristà, Yan Ye, and James A. Storer, editors, Data Compression Conference, DCC 2024, Snowbird, UT, USA, March 19-22, 2024, pages 143–152. IEEE, 2024. doi:10.1109/DCC58796.2024.00022.
- [61] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct Indexable Dictionaries with Applications to Encoding -ary Trees, Prefix Sums and Multisets. ACM Trans. Algorithms, 3(4):43, 2007. doi:10.1145/1290672.1290680.
- [62] Kunihiko Sadakane. Compressed Suffix Trees with Full Functionality. Theory Comput. Syst., 41(4):589–607, 2007. doi:10.1007/S00224-006-1198-X.
- [63] Kunihiko Sadakane and Gonzalo Navarro. Fully-Functional Succinct Trees. In Moses Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 134–149. SIAM, 2010. doi:10.1137/1.9781611973075.13.
- [64] Blair L Strang and Nigel D Stow. Circularization of the Herpes Simplex Virus Type 1 Genome upon Lytic Infection. Journal of Virology, 79(19):12487–12494, 2005.
- [65] Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. c-trie++: A Dynamic Trie tailored for Fast Prefix Searches. Inf. Comput., 285(Part):104794, 2022. doi:10.1016/J.IC.2021.104794.
- [66] Esko Ukkonen. On-Line Construction of Suffix Trees. Algorithmica, 14(3):249–260, 1995. doi:10.1007/BF01206331.
- [67] Peter Weiner. Linear Pattern Matching Algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1–11, 1973. doi:10.1109/SWAT.1973.13.
