Extending the Burrows-Wheeler Transform for Cartesian Tree Matching and Constructing It

Osterkamp, Eric M.; Köppl, Dominik

doi:10.4230/LIPIcs.CPM.2025.26

Extending the Burrows–Wheeler Transform for Cartesian Tree Matching and Constructing It

Eric M. Osterkamp

University of Münster, Germany Dominik Köppl

University of Yamanashi, Kofu, Japan

Abstract

Cartesian tree matching is a form of generalized pattern matching where a substring of the text matches with the pattern if they share the same Cartesian tree. This form of matching finds application for time series of stock prices and can be of interest for melody matching between musical scores. For the indexing problem, the state-of-the-art data structure is a Burrows–Wheeler transform based solution due to [Kim and Cho, CPM’21], which uses nearly succinct space and can count the number of substrings that Cartesian tree match with a pattern in time linear in the pattern length. The authors address the construction of their data structure with a straight-forward solution that, however, requires pointer-based data structures, resulting in $O(n\lg n)$ bits of space, where $n$ is the text length [Kim and Cho, CPM’21, Section A.4]. We address this bottleneck by a construction that requires $O(n\lg\sigma)$ bits of space and has a time complexity of $O(n\frac{\lg\sigma\lg n}{\lg\lg n})$ , where $\sigma$ is alphabet size. Additionally, we can extend this index for indexing multiple circular texts in the spirit of the extended Burrows–Wheeler transform without sacrificing the time and space complexities. We present this index in a dynamic variant, where we pay a logarithmic slowdown and need space linear in the input texts in bits for the extra functionality that we can incrementally add texts. Our extended setting is of interest for finding repetitive motifs common in the aforementioned applications, independent of offsets and scaling.

Keywords and phrases:

Cartesian tree matching, extended Burrows–Wheeler transform, construction algorithm, generalized pattern matching

Funding:

Dominik Köppl: JSPS KAKENHI Grant Numbers JP23H04378 and JP25K21150.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

Related Version:

Full Version: https://arxiv.org/abs/2411.12241

Acknowledgements:

We sincerely thank the anonymous reviewers for their constructive comments.

DOI:

10.4230/LIPIcs.CPM.2025.26

Event:

36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Editors:

Paola Bonizzoni and Veli Mäkinen

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

String matching is ubiquitous in computer science, and its variations are custom-made to solve a wide variety of problems. We here focus on a special kind of variation called substring consistent equivalence relation (SCER) [21]. Two strings $X$ and $Y$ are said to SCER-match if they have the same length and all substrings of equal length starting at the same text position SCER-match, i.e., $X[i..j]$ SCER-matches with $Y[i..j]$ for all $1\leq i\leq j\leq|X|=|Y|$ . A specific instance of SCER-matching is order-preserving matching [19, 16], which has been studied for the analysis of numerical time series. The aim of order-preserving matching is to match two strings if the relative order of the symbols in both strings is the same. Order-preserving matching therefore can find matches independently of value offsets and scaling.

Since order-preserving matching takes the global order of the symbols in a string into account, it may be too strict in applications that primarily consider the local ordering of ranges partitioned by the peaks. In fact, time series of stock prices are such a case, where a common pattern called the head-and-shoulder [8] involves one absolute peak (head) and neighboring local peaks (shoulders), where each of these neighboring peaks can be treated individually to match similar head-and-shoulder patterns with slightly changed local peaks. For such a reason, Park et al. [26] proposed Cartesian tree matching. Cartesian tree matching relaxes the notion of order-preserving matching in the sense that an order-preserving match is always a Cartesian tree match, but not necessarily the other way around. For instance, the two strings of digits $1537462$ and $2438372$ fit into the head-and-shoulder pattern, do not order-preserving match, but Cartesian tree match. For that, Cartesian tree matching compares the Cartesian trees of two strings to decide whether they match. The Cartesian tree [28] is a binary tree built upon an array of numbers. If all numbers are distinct, it is the min-heap whose in-order traversal retrieves the original array. Since its inception, Cartesian tree matching and variations thereof have attracted interest from researchers, whose studies include multiple pattern matching [26, 27], approximate matching [1, 18], substring matching [4], subsequence matching [24], indeterminate matching [10], cover matching [15], and palindromic matching [9].

In addition to stock prices, applications of generalized pattern matching can also be found in melody matching between musical scores. Pattern matching music scores such as differences, directions, or magnitudes have been studied [12]. However, the detection of repetitions in a piece of music has also been considered of importance [7]. The question therefore is whether we can find repetitive substrings that Cartesian tree match with a repetition of a set of melodic motifs (i.e., input texts). For that to be efficient, we want to index these motifs.

An index for Cartesian tree matching of a text string $T$ is a data structure built upon $T$ that can report the number of substrings of $T$ that Cartesian tree match a given pattern. Given $T$ has $n$ symbols drawn from an integer alphabet of size $\sigma$ , Park et al. [26] and Nishimoto et al. [23] proposed indexes for Cartesian tree matching of $T$ that both occupy $O(n\lg n)$ bits of space and are constructed with $O(n\lg n)$ and $O(n\sigma\lg n)$ additional bits of space, respectively. Park et al. [26, Section 5.1] achieve $O(m\lg\sigma)$ time and Nishimoto et al. [23, Section 5.1] $O(m(\sigma+\lg m)+occ)$ time for answering count queries, where $o c c$ is the number of occurrences of the pattern of length $m$ . Adapting Ferragina and Mantaci’s FM-index for exact string matching [5], Kim and Cho [17] proposed an index that occupies $3n+o(n)$ bits of space, and answers a count query in $O(m)$ time for a pattern of length $m$ . For construction, they proposed a straight-forward solution that takes as input the Cartesian suffix tree of Park et al. [26], which, however, requires $O(n\lg n)$ additional bits of space.

We address two goals in this paper. The first is a construction algorithm for Kim and Cho’s index that takes $O(n\lg\sigma)$ bits of working space, which is compact for constant alphabet sizes. While we consider an index supporting access to a character of the length- $n$ text $T$ as compact if its space is $O(n\lg\sigma)$ bits, we here restrict queries to Cartesian-tree pattern matching, and a Cartesian tree storing $n$ nodes can be represented in $2n$ bits, cf. [26, Section 3.5]. Therefore, a compact index for Cartesian-tree pattern matching takes $O(n)$ bits, a bound we reach for constant alphabets. Second, while all aforementioned indexes can partially address the problem by indexing multiple texts, it is hard to detect whether the pattern is a repetition of one of the input texts, which is of interest in case of indexing melodic motifs that can repeat. Here, our goal is an index that can find such matches, even if they start with different offsets of the same repetition. In concrete words, our aim is to index multiple texts for Cartesian pattern matching. The search space for a pattern are the texts that are considered to be infinite concatenations with themselves.

In this paper, we propose an extension of the index of Kim and Cho [17] for Cartesian tree matching with techniques of the extended Burrows–Wheeler transform [20], and call the resulting data structure the cBWT index. The cBWT index is an extension in the sense that it supports indexing multiple input texts for Cartesian tree matching circularly. We show that we can compute the cBWT index in $O(n\frac{\lg\sigma\lg n}{\lg\lg n})$ time and $O(n\lg\sigma)$ bits of space, where $n$ is the total length of all texts to index. Our construction allows to build the cBWT index incrementally, i.e., we support the incremental addition of a new string to the set of texts we index. We can also compute the original index of Kim and Cho within the same complexities. Our ideas stem from construction algorithms of indexes for parameterized matching by Hashimoto et al. [11] and Iseri et al. [14], which we recently extended for multiple circular texts [25]. During construction, the cBWT index supports the backward search and count queries for pattern strings in $O(m\frac{\lg\sigma\lg n}{\lg\lg n})$ time, where $m$ is the pattern length.

2 Preliminaries

Let $\lg=\log_{2}$ denote the logarithm to base two. We assume a random access model with word size $\Omega(\lg n)$ , where $n$ denotes the input size. An interval $\{i,i+1,..,j-1,j\}$ of integers is denoted by $[i..j]$ , where $[i..j]=\emptyset$ if $j<i$ .

Strings.

Let $\Sigma$ denote an alphabet. We call elements of $\Sigma$ symbols, a sequence of symbols from $\Sigma$ a string over $\Sigma$ , and denote the set of strings over $\Sigma$ by $\Sigma^{*}$ . Let $U,V,W\in\Sigma^{*}$ . The concatenation of $U$ and $V$ is denoted by $U\cdot V$ or $U V$ . We write $U^{k}$ if we concatenate $k$ instances of $U$ for a non-negative integer $k$ , and $U^{\omega}$ for the string obtained by infinitely concatenating $U$ . We call $U$ primitive if $U=X^{k}$ implies $U=X$ and $k=1$ . It is known that for every $U\in\Sigma^{*}$ there exists a unique primitive $X\in\Sigma^{*}$ and a unique integer $k$ such that $U=X^{k}$ , denoted by $\textup{root}(U)$ and $\textup{exp}(U)$ , respectively. If $X=UVW$ , then $U$ is called a prefix, $W$ a suffix, and $U, V, W$ substrings of $X$ . The length $\left|U\right|$ of $U$ is the number of symbols in $U$ , $\textup{lcp}(U,V)$ reports the length of the longest common prefix of $U$ and $V$ , $\varepsilon$ denotes the unique string of length $0$ , and we define $\Sigma^{+}=\Sigma^{*}-\{\varepsilon\}$ . Let $X\in\Sigma^{+}$ and $i,j\in[1..\left|X\right|]$ . Then $X[i]$ denotes the $i$ -th symbol in $X$ , and $X[i..j]=X[i]\cdots X[j]$ , where $X[i..j]=\varepsilon$ if $j<i$ , $X[..i]=X[1..i]$ , and $X[i..]=X[i..\left|X\right|]$ . For notational convenience, $X[..k]=\varepsilon$ if $k\leq 0$ , and $V[k..]=\varepsilon$ if $k\geq\left|V\right|+1$ . We call $1\leq p\leq\left|X\right|$ satisfying $X[q]=X[q+p]$ for every $q\in[1..\left|X\right|-p]$ a period of $X$ . Let $\textup{Rot}(X,0)=X$ and $\textup{Rot}(X,k+1)=\textup{Rot}(X,k)[2..]\cdot\textup{Rot}(X,k)[1]$ for each non-negative integer $k$ , i.e., $\textup{Rot}(X,k)$ denotes the $k$ -th left rotation of $X$ . The left rotations of $X$ for $k\in[0..\left|X\right|-1]$ are the conjugates of $X$ . We write $U<V$ if and only if (a) $U=V[..\left|U\right|]$ and $\left|U\right|<\left|V\right|$ or (b) $U[\textup{lcp}(U,V)+1]<V[\textup{lcp}(U,V)+1]$ .

$i$	1	2	3	4	5	6	7	8
$V[i]$	$\mathtt{7}$	$\mathtt{14}$	$\mathtt{5}$	$\mathtt{1}$	$\mathtt{11}$	$\mathtt{27}$	$\mathtt{11}$	$\mathtt{7}$

Figure 1: Example integer string

V

to illustrate queries. Here,

\textup{rank}_{11}(V,5)=1

,

\textup{rnkcnt}_{V}(2,7,7,14)=3

,

\textup{select}_{11}(V,2)=7

,

\textup{RNV}_{V}(1,5,8)=11

, and

\textup{MI}_{V}(5,6)=[4..8]

.

Let $V\in\Sigma^{+}$ be a (dynamic) string, $c\leq d\in\Sigma$ , $i\in[1..\left|V\right|+1]$ , $j\in[0..\left|V\right|]$ and $k\in[1..\left|V\right|]$ . Then $\textup{insert}_{V}(i,c)$ inserts the symbol $c$ at position $i$ of $V$ , $\textup{delete}_{V}(k)$ deletes the $k$ -th entry of $V$ , $\textup{rank}_{c}(V,j)$ returns the number of occurrences of $c$ in $V[..j]$ , $\textup{rnkcnt}_{V}(i,j,c,d)$ returns $\left|\{x\in[i..j]\mid c\leq V[x]\leq d\}\right|$ , $\textup{select}_{c}(V,i)$ returns the index of the $i$ -th occurrence of $c$ in $V$ if $i\leq\textup{rank}_{c}(V,\left|V\right|)$ , $\textup{RNV}_{V}(i,j,c)$ returns the smallest value in $V[i..j]$ larger than $c$ if it exists, and $\textup{MI}_{V}(j,c)$ returns the maximal interval $[\ell..r]$ such that $0\leq\ell\leq j\leq r\leq\left|V\right|$ and $V[x]\geq c$ for every $x\in[\ell+1..r]$ . See Figure 1 for examples. We represent strings by the following dynamic data structure, which supports the aforementioned operations and queries.

Lemma 2.1 ([14, Lemma 4]).

A dynamic string of length $n$ over $[0..\sigma]$ with $\sigma\leq n^{O(1)}$ can be stored in a data structure occupying $O(n\lg\sigma)$ bits of space, supporting insertion, deletion and the queries access, rank, rnkcnt, select, RNV and MI in $O(\frac{\lg\sigma\lg n}{\lg\lg n})$ time.

Alphabet.

Throughout, we will work with the integer alphabet $\Sigma=[0..\sigma]$ , where $\sigma\leq n^{O(1)}$ , and a special symbol $\$\not\in\Sigma$ stipulated to be smaller than any symbol from $\Sigma$ . The special symbol is motivated by the construction algorithm, and as a delimiter when a string should not be considered circular, as in the index of Kim and Cho [17]. Let $\Sigma_{\$}=\Sigma\cup\{\$\}$ .

Cartesian Tree Matching.

The Cartesian tree $\textup{ct}(V)$ of a string $V\in\Sigma_{\$}^{*}$ is a binary tree defined as follows. If $V=\varepsilon$ , then $\textup{ct}(V)$ is the empty tree. If $V\neq\varepsilon$ , let $i$ denote the position of the smallest symbol in $V$ , where ties are broken with respect to text position. Then $\textup{ct}(V)$ has $V[i]$ as its root, $\textup{ct}(V[..i-1])$ as its left subtree and $\textup{ct}(V[i+1..])$ as its right subtree. We say that two strings $U,V\in\Sigma_{\$}^{*}$ Cartesian tree match (ct-match) if and only if $\textup{ct}(U)=\textup{ct}(V)$ , and write $U\approx V$ . For instance, in Figure 2 the substring $T[3..6]=\mathtt{5178}$ of $T=\mathtt{625178265}$ ct-matches $P=\mathtt{7347}$ while $T[4..7]=\mathtt{1782}$ does not.

(a)

\textup{ct}(T)

.

(b)

\textup{ct}(P)

.

(c) ct(

T[3..6]

).

(d) ct(

T[4..7]

).

Figure 2: Cartesian trees of

T=\mathtt{625178265}

,

P=\mathtt{7347}

,

T[3..6]=\mathtt{5178}

, and

T[4..7]=\mathtt{1782}

. Here,

P\approx T[3..6]

and

P\not\approx T[4..7]

.

Parent Distance Encoding.

Park et al. [26] use an encoding scheme for representing Cartesian trees that reduces the computation of a ct-match to checking whether the encoded strings exactly match. We here give a variant thereof which takes the special symbol $\$$ into account. Let $\infty\not\in\Sigma_{\$}$ denote a symbol larger than any integer. The parent distance encoding $\langle V\rangle$ of $V\in\Sigma_{\$}$ is a string of length $\left|V\right|$ over $\Sigma_{\$}\cup\{\infty\}$ such that

\langle V\rangle[i]=\begin{cases*}\infty&\text{if $\$\neq V[i]<\min\{V[j]\mid j% \in[..i-1]\}$,}\\ V[i]&\text{if $V[i]=\$$, }\\ i-\max\{j\in[..i-1]\mid V[j]\leq V[i]\}&\text{otherwise,}\end{cases*}

for each $i\in[1..\left|V\right|]$ . We have $\langle V\rangle[..i]=\langle V[..i]\rangle$ for each $V\in\Sigma_{\$}^{+}$ and $i\in[1..\left|V\right|]$ . For example, $\langle\mathtt{41327\$3}\rangle=\mathtt{\infty\infty 121\$1}$ .¹¹1As a side note, the parent distance encoding has been leveraged for devising compact $O(n)$ -bit data structures answering range minimum queries, such as the 2d-Min-Heap by Fischer [6] and the LRM-trees by Sadakane and Navarro [22], which semantically coincide.

Lemma 2.2 ([26, Theorem 1]).

Let $U,V\in\Sigma^{*}$ . Then $U\approx V\Leftrightarrow\langle U\rangle=\langle V\rangle$ .

Problem Statement.

We are interested in a solution to the following problem.

Problem (Count).

Given $\emptyset\neq\mathcal{T}\subset\Sigma^{+}$ and $P\in\Sigma^{*}$ , count each of the conjugates of the strings in $\mathcal{T}$ whose infinite iteration has a prefix ct-matching $P$ .

Throughout, let $\emptyset\neq\mathcal{T}=\{T_{1},...,T_{d}\}\subset\Sigma^{+}$ . Our running example consists of the strings $T_{1}=\mathtt{512}$ , $T_{2}=\mathtt{5363}$ and $T_{3}=\mathtt{4478}$ over $\Sigma=[0..8]$ . Given our running example, the solution to Count for $P_{1}=\mathtt{643}$ and $P_{2}=\mathtt{5634}$ is $0$ and $2$ , respectively. Here, $P_{2}\approx\textup{Rot}(T_{3},2)^{\omega}[..4]\approx\textup{Rot}(T_{1},2)^{% \omega}[..4]$ since $\langle P_{2}\rangle=\mathtt{\infty 1\infty 1}=\langle\textup{Rot}(T_{3},2)^{% \omega}[..4]\rangle=\langle\textup{Rot}(T_{1},2)^{\omega}[..4]\rangle$ .

3 $O(n\lg\sigma)$ -bit Index

Let $n=\left|T_{1}\cdots T_{d}\right|$ , $n_{k}=\left|T_{k}\right|$ for each $k\in[1..d]$ , and $C_{\mathcal{T}}({i})=\textup{Rot}(T_{j},i-1-\sum_{k=1}^{j-1}n_{k})$ for each $i\in[1..n]$ , where $j=\min\{h\in[1..d]\mid\sum_{k=1}^{h}n_{k}\geq i\}$ , i.e., we identify each conjugate of each text $T_{1},...,T_{d}$ with its start position inside the concatenation $T_{1}\cdots T_{d}$ , such that we give them ranks from $1$ to $n$ . See Figure 3 for an example. In what follows, we put these ranks in a specific order by a permutation of $[1..n]$ such that the permuted ranks of all conjugates with prefixes of their infinite concatenation ct-matching a pattern form an interval $[\ell..r]\subseteq[1..n]$ .

3.1 Conjugate Array

We express the permutation of the ranks of all conjugates by the so-called conjugate array, which we will subsequently define. To achieve this, we extend the ideas of Mantaci et al. [20] to Cartesian tree matching, and introduce a preorder on $\Sigma_{\$}^{+}$ . For notational convenience, we define the rotational parent distance encoding $\langle V\rangle_{\textup{r}}$ of $V\in\Sigma_{\$}^{+}$ by $\langle V\rangle_{\textup{r}}=\langle V^{2}\rangle[\left|V\right|+1..]$ . For any $V,U\in\Sigma_{\$}^{+}$ , let $V\preceq_{\omega}U$ if and only if there exists some natural number $i$ such that $\langle V^{\omega}[..i]\rangle<\langle U^{\omega}[..i]\rangle$ or $\textup{root}(\langle V\rangle_{\textup{r}})=\textup{root}(\langle U\rangle_{% \textup{r}})$ holds.

Lemma 3.1.

The relation $\preceq_{\omega}$ defines a total preorder on $\Sigma_{\$}^{+}$ , i.e., the relation $\preceq_{\omega}$ is binary, reflexive, transitive and connected.

We call this preorder the $\omega$ -preorder. We write $V=_{\omega}U$ if and only if $V\preceq_{\omega}U\land U\preceq_{\omega}V$ , and $V\prec_{\omega}U$ if and only if $V\preceq_{\omega}U\land V\not=_{\omega}U$ . For instance, $T_{3}=\mathtt{4478}\prec_{\omega}\mathtt{125}=\textup{Rot}(T_{1},1)$ and $T_{2}=\mathtt{5363}=_{\omega}\mathtt{6353}=\textup{Rot}(T_{2},2)$ . Note that the latter example violates antisymmetry, i.e., the relation $\preceq_{\omega}$ is not a total order. In the following, we present a convenient but not necessarily optimal way to compute the $\omega$ -preorder of two given strings.

Lemma 3.2 (Weak Periodicity Lemma).

Let $p$ and $q$ be two periods of a string $X$ . If $p+q\leq\left|X\right|$ , then $\gcd(p,q)$ is also a period of $X$ .

Lemma 3.3 ([13, Lemma 5]).

Let $V,U\in\Sigma_{\$}^{+}$ and $z=\max\{\left|V\right|,\left|U\right|\}$ . Then $V=_{\omega}U$ if and only if $\langle V^{\omega}[..3z]\rangle=\langle U^{\omega}[..3z]\rangle$ .

Proof.

Without loss of generality, $\left|U\right|=z$ . Let $\left|V\right|=i$ , $\left|\textup{root}(\langle V\rangle_{\textup{r}})\right|=j$ and $\left|\textup{root}(\langle U\rangle_{\textup{r}})\right|=k$ .

$(\Leftarrow)$

Assume $\langle V^{\omega}[..3z]\rangle=\langle U^{\omega}[..3z]\rangle$ . Then, on the one hand,

\langle\textup{Rot}(V,z)\rangle_{\textup{r}}^{\omega}[..2z]=\langle V^{\omega}% [..3z]\rangle[z+1..]=\langle U^{\omega}[..3z]\rangle[z+1..]=\langle U\rangle_{% \textup{r}}\cdot\langle U\rangle_{\textup{r}}.

Since the rotational prev-encoding is commutative with left rotations, $j$ is a period of $\langle\textup{Rot}(V,z)\rangle_{\textup{r}}$ . Consequently, both $z$ and $j$ are periods of $\langle U\rangle_{\textup{r}}\cdot\langle U\rangle_{\textup{r}}$ . Since $j+z\leq 2z=\left|\langle U\rangle_{\textup{r}}\cdot\langle U\rangle_{\textup{r% }}\right|$ , we can apply Lemma 3.2 and find that $\gcd(j,z)$ is a period of $\langle U\rangle_{\textup{r}}\cdot\langle U\rangle_{\textup{r}}$ . As $\gcd(j,z)$ divides $z=\left|\langle U\rangle_{\textup{r}}\right|$ , $\langle U\rangle_{\textup{r}}$ can be formed by repeating $\langle U\rangle_{\textup{r}}[..\gcd(j,z)]$ an integral number of times, which implies $\gcd(j,z)\geq k$ , i.e., $i\geq j\geq k$ . On the other hand,

\langle V\rangle_{\textup{r}}\cdot\langle V\rangle_{\textup{r}}=\langle V^{% \omega}[..3z]\rangle[i+1..3i]=\langle U^{\omega}[..3z]\rangle[i+1..3i]=\langle% \textup{Rot}(U,i)\rangle_{\textup{r}}^{\omega}[..2i].

Since the rotational prev-encoding is commutative with left rotations, $k$ is a period of $\langle\textup{Rot}(U,i)\rangle_{\textup{r}}$ . As $k\leq i$ , $k$ is a period of $\langle V\rangle_{\textup{r}}\cdot\langle V\rangle_{\textup{r}}$ in addition to $i$ . Then $k+i\leq 2i=\left|\langle V\rangle_{\textup{r}}\cdot\langle V\rangle_{\textup{r% }}\right|$ and Lemma 3.2 imply that $\gcd(k,i)$ is a period of $\langle V\rangle_{\textup{r}}\cdot\langle V\rangle_{\textup{r}}$ . As $\gcd(k,i)$ divides $i=\left|\langle V\rangle_{\textup{r}}\right|$ , $\langle V\rangle_{\textup{r}}[..\gcd(k,i)]$ can be repeated an integral number of times to form $\langle V\rangle_{\textup{r}}$ , which implies $\gcd(k,i)\geq j$ . Consequently, $k\geq j$ , and therefore $\left|\textup{root}(\langle V\rangle_{\textup{r}})\right|=j=k=\left|\textup{% root}(\langle U\rangle_{\textup{r}})\right|$ . In particular, $\textup{root}(\langle V\rangle_{\textup{r}})=\langle V^{\omega}[..3z]\rangle[3% z-j+1..]=\langle U^{\omega}[..3z]\rangle[3z-k+1..]=\textup{root}(\langle U% \rangle_{\textup{r}})$ , i.e., $V=_{\omega}U$ .

$(\Rightarrow)$

Let $V=_{\omega}U$ . Assume $\langle V^{\omega}[..3z]\rangle\neq\langle U^{\omega}[..3z]\rangle$ with $x$ minimal such that $\langle V^{\omega}[..3z]\rangle[x]\neq\langle U^{\omega}[..3z]\rangle[x]$ . If $\max\{\langle V^{\omega}[..3z]\rangle[x],\langle U^{\omega}[..3z]\rangle[x]\}<\infty$ , then $\textup{Rot}(\textup{root}(\langle V\rangle_{\textup{r}}),x-1)[1]=\langle V^{% \omega}[..3z]\rangle[x]\neq\langle U^{\omega}[..3z]\rangle[x]=\textup{Rot}(% \textup{root}(\langle U\rangle_{\textup{r}}),x-1)[1]$ , which contradicts $V=_{\omega}U$ . Hence, and without loss of generality, assume $\langle V^{\omega}[..3z]\rangle[x]=\infty$ . Then $\langle U^{\omega}[..3z]\rangle[x]<\infty$ , $x\leq\left|\textup{root}(\langle V\rangle_{\textup{r}})\right|$ , and consequently $\textup{root}(\langle U\rangle_{\textup{r}})[x]=\langle U^{\omega}[..3z]% \rangle[x]<x\leq\textup{root}(\langle V\rangle_{\textup{r}})[x]$ , a contradiction to $V=_{\omega}U$ . Thus, $\langle V^{\omega}[..3z]\rangle=\langle U^{\omega}[..3z]\rangle$ .

$\hfill\blacktriangleleft$

Corollary 3.4.

Let $V,U\in\Sigma_{\$}^{+}$ and $z=\max\{\left|V\right|,\left|U\right|\}$ . Then $V\prec_{\omega}U$ if and only if $\langle V^{\omega}[..3z]\rangle<\langle U^{\omega}[..3z]\rangle$ .

Similarly to Boucher et al. [2], we define the conjugate array $\textup{CA}_{\mathcal{T}}$ of $\mathcal{T}$ as the string of length $n$ over $[1..n]$ such that $\textup{CA}_{\mathcal{T}}[i]=j$ if and only if

i-1=|\{k\in[1..n]\mid C_{\mathcal{T}}({k})\prec_{\omega}C_{\mathcal{T}}({j})% \text{ or }C_{\mathcal{T}}({k})=_{\omega}C_{\mathcal{T}}({j})\land k<j\}|,

i.e., $i-1$ is the number of all conjugates smaller than $C_{\mathcal{T}}({j})$ according to $\omega$ -preorder, where we break ties first with respect to text index, and then with respect to text position. By resolving all ties this way, we ensure that $\textup{CA}_{\mathcal{T}}$ is well-defined. Since $\textup{CA}_{\mathcal{T}}$ is a permutation, its inverse, which we denote by $\textup{CA}_{\mathcal{T}}^{-1}$ , is also well-defined. See Figure 3 for our running example’s conjugate array.

$i$	$C_{\mathcal{T}}({i})$	$\langle C_{\mathcal{T}}({i})^{\omega}[..12]\rangle$	$\textup{CA}_{\mathcal{T}}^{-1}[i]$	$\textup{CA}_{\mathcal{T}}[i]$	$\langle C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})^{\omega}[..12]\rangle$	$C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})$
1	$\mathtt{512}$	$\mathtt{\infty\infty 1131131131}$	$\mathtt{9}$	$\mathtt{8}$	$\mathtt{\infty 11131113111}$	$\mathtt{4478}$
2	$\mathtt{125}$	$\mathtt{\infty 11311311311}$	$\mathtt{3}$	$\mathtt{9}$	$\mathtt{\infty 11311131113}$	$\mathtt{4784}$
3	$\mathtt{251}$	$\mathtt{\infty 1\infty 113113113}$	$\mathtt{7}$	$\mathtt{2}$	$\mathtt{\infty 11311311311}$	$\mathtt{125}$
4	$\mathtt{5363}$	$\mathtt{\infty\infty 1212121212}$	$\mathtt{10}$	$\mathtt{5}$	$\mathtt{\infty 12121212121}$	$\mathtt{3635}$
5	$\mathtt{3635}$	$\mathtt{\infty 12121212121}$	$\mathtt{4}$	$\mathtt{7}$	$\mathtt{\infty 12121212121}$	$\mathtt{3536}$
6	$\mathtt{6353}$	$\mathtt{\infty\infty 1212121212}$	$\mathtt{11}$	$\mathtt{10}$	$\mathtt{\infty 1\infty 111311131}$	$\mathtt{7844}$
7	$\mathtt{3536}$	$\mathtt{\infty 12121212121}$	$\mathtt{5}$	$\mathtt{3}$	$\mathtt{\infty 1\infty 113113113}$	$\mathtt{251}$
8	$\mathtt{4478}$	$\mathtt{\infty 11131113111}$	$\mathtt{1}$	$\mathtt{11}$	$\mathtt{\infty\infty 1113111311}$	$\mathtt{8447}$
9	$\mathtt{4784}$	$\mathtt{\infty 11311131113}$	$\mathtt{2}$	$\mathtt{1}$	$\mathtt{\infty\infty 1131131131}$	$\mathtt{512}$
10	$\mathtt{7844}$	$\mathtt{\infty 1\infty 111311131}$	$\mathtt{6}$	$\mathtt{4}$	$\mathtt{\infty\infty 1212121212}$	$\mathtt{5363}$
11	$\mathtt{8447}$	$\mathtt{\infty\infty 1113111311}$	$\mathtt{8}$	$\mathtt{6}$	$\mathtt{\infty\infty 1212121212}$	$\mathtt{6353}$

Figure 3: The conjugate array

\textup{CA}_{\mathcal{T}}

of our running example

\mathcal{T}=\{\mathtt{512},\mathtt{5363},\mathtt{4478}\}

.

We define the conjugate range $\textup{CR}_{\mathcal{T}}(P)$ of a pattern $P\in\Sigma^{*}$ of length $m$ in $\mathcal{T}$ as a maximal interval $[\ell..r]\subseteq[1..n]$ such that $P\approx C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})^{\omega}[..m]$ for every $i\in[\ell..r]$ . Leveraging Lemma 2.2, we find that the conjugate range is well-defined.

Corollary 3.5.

Let $\emptyset\neq\mathcal{T}=\{T_{1},...,T_{d}\}\subset\Sigma_{\$}^{+}$ , $n=\left|T_{1}\cdots T_{d}\right|$ , $P\in\Sigma^{*}$ , and $m=\left|P\right|$ . Then $P\approx C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})^{\omega}[..m]$ if and only if $i\in\textup{CR}_{\mathcal{T}}(P)$ .

We have $\textup{CR}_{\mathcal{T}}(\varepsilon)=[1..n]$ because the empty string is a prefix of every conjugate. The computation of $\textup{CR}_{\mathcal{T}}(P)$ for some pattern $P\in\Sigma^{*}$ on indexes related to the FM-index usually perform an algorithmic technique called the backward search. The backward search for $P$ in $\mathcal{T}$ processes $P$ backwards in an online manner and refines $\textup{CR}_{\mathcal{T}}(P[i+1..])$ to $\textup{CR}_{\mathcal{T}}(P[i..])$ at the $i$ -th step on reading $P[i]$ with $P[|P|+1..]=\epsilon$ . Finally, the length of $\textup{CR}_{\mathcal{T}}(P)$ is the solution to Count by Corollary 3.5. For our running example, $\textup{CR}_{\mathcal{T}}(\mathtt{643})=\emptyset$ and $\textup{CR}_{\mathcal{T}}(\mathtt{5634})=[6..7]$ . Below we define the necessary tools to allow for an efficient backward search.

3.2 LF-mapping

We want to define a map that allows us to cycle backwards through each of the texts to be indexed by mapping to the position of a conjugate in the conjugate array from the position of its left rotation. However, if we want to represent this mapping space-efficiently, we have to be careful since we used tie-breaks within texts when we sorted the conjugates by a preorder. Our idea is to relax the requirements and define a map that allows us to cycle backwards through a text that is $\omega$ -equal to the original to dodge any issues arising from our tie-breaks. For that, we want to cycle backwards in text order inside the roots of each $\langle..\rangle_{\textup{r}}$ -encoded text. Whenever we want to move backwards at the starting position of a root, we jump to its end position. We express this backwards movement with the following permutation. For every $i\in[1..n]$ with $j=\min\{h\in[1..d]\mid\sum_{k=1}^{h}n_{k}\geq i\}$ , let

\textup{prev}_{\mathcal{T}}(i)=\begin{cases*}i-1+\mid\textup{root}(\langle T_{% j}\rangle_{\textup{r}})\mid&\text{if $C_{\mathcal{T}}({i})=_{\omega}T_{j}$,}\\ i-1&\text{otherwise.}\end{cases*}

For our running example, $\textup{prev}_{\mathcal{T}}=(1\;3\;2)(4\;5)(6\;7)(8\;11\;10\;9)$ . Here, we observe that we have two cycles $(4\;5)$ and $(6\;7)$ corresponding to $T_{2}[1..2]=\mathtt{53}$ and $T_{2}[3..4]=\mathtt{63}$ , respectively, which are $\omega$ -equal to the original text $T_{2}=\mathtt{5363}$ . Now we are ready to express the LF-mapping in terms of the function $\textup{prev}_{\mathcal{T}}$ . The LF-mapping $\textup{LF}_{\mathcal{T}}$ of $\mathcal{T}$ is a string of length $n$ over $[1..n]$ such that $\textup{LF}_{\mathcal{T}}[i]=\textup{CA}_{\mathcal{T}}^{-1}[\textup{prev}_{% \mathcal{T}}(\textup{CA}_{\mathcal{T}}[i])]$ for each $i\in[1..n]$ . Since the LF-mapping is a permutation, it has an inverse $\textup{LF}_{\mathcal{T}}^{-1}$ , which we call the FL-mapping of $\mathcal{T}$ and denote by $\textup{FL}_{\mathcal{T}}$ . See Figure 5 for an example. The LF-mapping is at the core of the backward search. However, storing LF-mapping and FL-mapping in their plain form creates the need for two integer arrays of length $n$ and entries of $\lg n$ bits, which motivates the following encoding scheme. Let the rotational Cartesian tree signature encoding $\llbracket V\rrbracket$ of $V\in\Sigma_{\$}^{+}$ denote a string of length $\left|V\right|$ over $\Sigma_{\$}$ such that

\llbracket V\rrbracket[i]=\begin{cases*}V[i]&\text{if $V[i]=\$$,}\\ \textup{rank}_{\infty}(\langle\textup{Rot}(V,i)\rangle,\left|V\right|)-\textup% {rank}_{\infty}(\langle V[i]\cdot\textup{Rot}(V,i)\rangle[2..],\left|V\right|)% &\text{otherwise,}\\ \end{cases*}

for each $i\in[1..\left|V\right|]$ , i.e., if $V[i]\neq\$$ , then $\llbracket V\rrbracket[i]$ reports the number of positions $j\in[1..\left|V\right|]$ such that $\textup{Rot}(V,i)[j]\geq V[i]$ and $\langle\textup{Rot}(V,i)\rangle[j]=\infty$ . See Figure 4 for an example. Note that $\llbracket V\rrbracket[i..\textup{select}_{\$}(V,1)]=\llbracket V[i..\textup{% select}_{\$}(V,1)]\rrbracket$ for each $V\in\Sigma_{\$}^{+}$ satisfying $\textup{rank}_{\$}(V,\left|V\right|)\geq 1$ and $i\in[1..\textup{select}_{\$}(V,1)]$ .

Lemma 3.6.

Let $V\in\Sigma_{\$}^{+}$ . Then $\sum_{i=1}^{j}\llbracket V\rrbracket[i]\leq\left|V\right|$ , where $j=\textup{select}_{\$}(V,1)-1$ if $\textup{rank}_{\$}(V,\left|V\right|)\geq 1$ , and $j=\left|V\right|$ otherwise.

$i$	$T_{3}[i]$	$\textup{Rot}(T_{3},i)$	$\langle\textup{Rot}(T_{3},i)\rangle$	$\langle T_{3}[i]\cdot\textup{Rot}(T_{3},i)\rangle$	$\llbracket T_{3}\rrbracket[i]$
1	$\mathtt{4}$	$\mathtt{4784}$	$\mathtt{\infty 113}$	$\mathtt{\infty 1113}$	$\mathtt{1}$
2	$\mathtt{4}$	$\mathtt{7844}$	$\mathtt{\infty 1\infty 1}$	$\mathtt{\infty 1131}$	$\mathtt{2}$
3	$\mathtt{7}$	$\mathtt{8447}$	$\mathtt{\infty\infty 11}$	$\mathtt{\infty 1\infty 11}$	$\mathtt{1}$
4	$\mathtt{8}$	$\mathtt{4478}$	$\mathtt{\infty 111}$	$\mathtt{\infty\infty 111}$	$\mathtt{0}$

Figure 4: Rotational Cartesian tree signature encoding

\llbracket T_{3}\rrbracket

of

T_{3}=\mathtt{4478}

.

Taking advantage of both encodings, we investigate how the $\omega$ -preorder of two strings changes if we rotate them. For convenience, we borrowed the following notation from literature [14, 11]. Let $\pi(V)=\llbracket V\rrbracket[1]$ for every $V\in\Sigma_{\$}^{+}$ , and $\textup{lcp}^{\infty}(U,W)=\textup{rank}_{\infty}(\langle U\rangle,\textup{lcp% }(\langle U\rangle,\langle W\rangle))$ for each $U,W\in\Sigma_{\$}^{*}$ . For example, $\pi(\mathtt{4478})=1$ and $\textup{lcp}^{\infty}(\mathtt{4478},\mathtt{7844})=1$ .

Lemma 3.7 ([17, Lemma 3]).

Let $V,U\in\Sigma_{\$}^{+}$ such that $\textup{Rot}(V,1)\prec_{\omega}\textup{Rot}(U,1)$ . Then $V\prec_{\omega}U$ if and only if (a) $\pi(V)=\$$ or (b) $\pi(V)\geq\min\{e,\pi(U)\}$ and $\pi(U)\neq\$$ , where $e=\textup{lcp}^{\infty}(\textup{Rot}(V,1),\textup{Rot}(U,1))$ .

Proof.

We show the statement for $\min\{\pi(V),\pi(U)\}\in\Sigma$ since it is non-trivial. Let $z=\max\{\left|V\right|,\left|U\right|\}$ and $\lambda=\textup{lcp}(\langle\textup{Rot}(V,1)^{\omega}[..3z]\rangle,\langle% \textup{Rot}(U,1)^{\omega}[..3z]\rangle)$ . Then $\langle\textup{Rot}(V,1)^{\omega}[..3z]\rangle<\langle\textup{Rot}(U,1)^{% \omega}[..3z]\rangle$ by Corollary 3.4, $\lambda<3z$ , and $\textup{select}_{\infty}(\langle\textup{Rot}(V,1)\rangle,e)\leq\min\{\left|V% \right|,\left|U\right|\}$ .

( $\Rightarrow$ ): Assume $\pi(V)<\min\{e,\pi(U)\}$ . Let $i=\textup{select}_{\infty}(\langle V\rangle,2)$ . Then $\langle V\rangle[..i-1]=\langle U\rangle[..i-1]$ and $\langle V\rangle[i]=\infty\neq i-1=\langle U\rangle[i]$ , i.e. $\langle U\rangle<\langle V\rangle$ . Consequently, $\langle U^{\omega}[..3z]\rangle<\langle V^{\omega}[..3z]\rangle$ , which implies $U\prec_{\omega}V$ by Corollary 3.4. The statement follows by contraposition.
( $\Leftarrow$ ): We exhaust all possible cases for $\pi(V)\geq\min\{e,\pi(U)\}$ .
Case 1.: Assume $\min\{e,\pi(V)\}>\pi(U)$ . Let $i=\textup{select}_{\infty}(\langle U\rangle,2)$ . Then $\langle V\rangle[..i-1]=\langle U\rangle[..i-1]$ and $\langle V\rangle[i]=i-1\neq\infty=\langle U\rangle[i]$ , i.e. $\langle V\rangle<\langle U\rangle$ . Consequently, $\langle V^{\omega}[..3z]\rangle<\langle U^{\omega}[..3z]\rangle$ , which implies $V\prec_{\omega}U$ by Corollary 3.4.
Case 2.: Assume $\min\{\pi(V),\pi(U)\}\geq e$ . Then we have $\langle V^{\omega}[..3z]\rangle[..\lambda+1]=\langle U^{\omega}[..3z]\rangle[.% .\lambda+1]$ . Since $\textup{Rot}(V,1)\prec_{\omega}\textup{Rot}(U,1)$ , $\lambda+2\leq 3z$ by Lemma 3.3. If $\langle\textup{Rot}(U,1)^{\omega}[..3z]\rangle[\lambda+1]=\infty$ , $\langle V^{\omega}[..3z]\rangle[\lambda+2]\leq\lambda<\lambda+1=\langle U^{% \omega}[..3z]\rangle[\lambda+2]$ . If $\langle\textup{Rot}(U,1)^{\omega}[..3z]\rangle[\lambda]\neq\infty$ , then $\langle V^{\omega}[..3z]\rangle[\lambda+2]=\langle\textup{Rot}(V,1)^{\omega}[.% .3z]\rangle[\lambda+1]<\langle\textup{Rot}(U,1)^{\omega}[..3z]\rangle[\lambda+% 1]=\langle U^{\omega}[..3z]\rangle[\lambda+2]$ . Thus, $V\prec_{\omega}U$ by Corollary 3.4.
Case 3.: Assume $e>\pi(V)=\pi(U)$ . Then $\langle V^{\omega}[..3z]\rangle[..\lambda+1]=\langle U^{\omega}[..3z]\rangle[.% .\lambda+1]$ and $\lambda\leq z$ . Moreover, $\langle V^{\omega}[..3z]\rangle[\lambda+2]=\langle\textup{Rot}(V,1)^{\omega}[.% .3z]\rangle[\lambda+1]<\langle\textup{Rot}(U,1)^{\omega}[..3z]\rangle[\lambda+% 1]=\langle U^{\omega}[..3z]\rangle[\lambda+2]$ . Hence, $V\prec_{\omega}U$ by Corollary 3.4.

$\hfill\blacktriangleleft$

Our representation of the LF- and FL-mapping consists of two strings $\textup{L}_{\mathcal{T}}$ and $\textup{F}_{\mathcal{T}}$ , which are defined as follows. First, $\textup{L}_{\mathcal{T}}$ is the string of length $n$ over $\Sigma_{\$}$ such that $\textup{L}_{\mathcal{T}}[i]=\pi(C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[% \textup{LF}_{\mathcal{T}}[i]]}))=\llbracket C_{\mathcal{T}}({\textup{CA}_{% \mathcal{T}}[i]})\rrbracket[\left|C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i% ]})\right|]$ for each $i\in[1..n]$ .

Corollary 3.8.

Let $\emptyset\neq\mathcal{T}\subset\Sigma_{\$}^{+}$ , $n$ the accumulated length of all texts in $\mathcal{T}$ , $i,j\in[1..n]$ , and $i<j$ . If $\textup{L}_{\mathcal{T}}[i]=\textup{L}_{\mathcal{T}}[j]$ , then $\textup{LF}_{\mathcal{T}}[i]<\textup{LF}_{\mathcal{T}}[j]$ .

Second, $\textup{F}_{\mathcal{T}}$ is the string of length $n$ over $\Sigma_{\$}$ such that $\textup{F}_{\mathcal{T}}[\textup{LF}_{\mathcal{T}}[i]]=\textup{L}_{\mathcal{T}% }[i]$ for each $i\in[1..n]$ . By what follows, $\textup{L}_{\mathcal{T}}$ and $\textup{F}_{\mathcal{T}}$ suffice to compute both LF- and FL-mapping of $\mathcal{T}$ .

Corollary 3.9.

Let $\emptyset\neq\mathcal{T}\subset\Sigma_{\$}^{+}$ , $n$ the accumulated length of all texts in $\mathcal{T}$ , and $i\in[1..n]$ . Then $\textup{LF}_{\mathcal{T}}[i]=\textup{select}_{\textup{L}_{\mathcal{T}}[i]}(% \textup{F}_{\mathcal{T}},\textup{rank}_{\textup{L}_{\mathcal{T}}[i]}(\textup{L% }_{\mathcal{T}},i))$ and $\textup{FL}_{\mathcal{T}}[i]=\textup{select}_{\textup{F}_{\mathcal{T}}[i]}(% \textup{L}_{\mathcal{T}},\textup{rank}_{\textup{F}_{\mathcal{T}}[i]}(\textup{F% }_{\mathcal{T}},i))$ .

In Figure 5 we present $\textup{F}_{\mathcal{T}}$ and $\textup{L}_{\mathcal{T}}$ of our running example.

$\blacktriangleright$ Remark 3.10.

Let $d=1$ , $\textup{rank}_{\$}(T_{1},n_{1})=1$ and $T_{1}[n_{1}]=\$$ . If we substitute the occurrence of $\$$ in $\textup{F}_{\mathcal{T}}$ and $\textup{L}_{\mathcal{T}}$ for $-1$ , then the modified strings constitute the integer-based representation of Kim and Cho’s index [17, Section 4].

3.3 Backward Search

The LF-mapping can be leveraged for the backward search by the following result, which is due to Lemma 2.2 and Corollary 3.5.

Lemma 3.11 ([17, Lemma 6]).

Let $\emptyset\neq\mathcal{T}\subset\Sigma_{\$}^{+}$ , $P\in\Sigma^{+}$ , $\left|P\right|=m$ , $i\in[1..m]$ , $h=\pi(P[i..]\cdot\$)$ and $e=\textup{rank}_{\infty}(\langle P[i..]\rangle,m-i+1)$ . For $j\in\textup{CR}_{\mathcal{T}}(P[i+1..])$ , $\textup{LF}_{\mathcal{T}}[j]\in\textup{CR}_{\mathcal{T}}(P[i..])$ if and only if (a) $e>1$ and $\textup{L}_{\mathcal{T}}[j]=h$ or (b) $e=1$ and $\textup{L}_{\mathcal{T}}[j]\geq h$ .

At this point it is straight-forward to apply the techniques developed by Kim and Cho [17, Sections 5 and 6] to define a static index of $\mathcal{T}\subset\Sigma^{+}$ that occupies $3n+o(n)$ bits of space and that solves Count in $O(m)$ time, where $m$ is the pattern length. However, for brevity and in view of a space efficient construction of our proposed index and its extension, we will represent $\textup{F}_{\mathcal{T}}$ and $\textup{L}_{\mathcal{T}}$ by the dynamic data structure of Lemma 2.1, and introduce an auxiliary string for the backward search. Let $\textup{LCP}_{\mathcal{T}}^{\infty}$ denote a string of length $n$ over $\Sigma$ such that $\textup{LCP}_{T}^{\infty}[1]=0$ and $\textup{LCP}_{\mathcal{T}}^{\infty}[i]=\textup{lcp}^{\infty}(C_{\mathcal{T}}({% \textup{CA}_{T}[i]}),C_{\mathcal{T}}({\textup{CA}_{T}[i-1]}))$ for each $i\in[2..n]$ .

Lemma 3.12.

Let $\emptyset\neq\mathcal{T}=\{T_{1},...,T_{d}\}\subset\Sigma_{\$}^{+}$ , $n=\left|T_{1}\cdots T_{d}\right|$ , $i,j\in[1..n]$ , and $i<j$ . Then $\textup{lcp}^{\infty}(C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]}),C_{% \mathcal{T}}({\textup{CA}_{\mathcal{T}}[j]}))=\min\{\textup{LCP}_{\mathcal{T}}% ^{\infty}[k]\mid k\in[i+1..j]\}=\textup{RNV}_{\textup{LCP}_{\mathcal{T}}^{% \infty}}(i+1,j,-1)$ .

Lemma 3.13.

Let $\emptyset\neq\mathcal{T}\subset\Sigma_{\$}^{+}$ , $P\in\Sigma^{+}$ , $m=\left|P\right|$ , $i\in[1..m]$ , $[\ell..r]=\textup{CR}_{\mathcal{T}}(P[i+1..])$ , $h=\pi(P[i..]\cdot\$)$ , and $e=\textup{rank}_{\infty}(\langle P[i..]\rangle,m-i+1)$ . Then Algorithm 1 correctly computes $[\ell^{\prime}..r^{\prime}]=\textup{CR}_{\mathcal{T}}(P[i..])$ .

Algorithm 1 Computing the conjugate range

\textup{CR}_{\mathcal{T}}(P[i..])

. Here,

\emptyset\neq\mathcal{T}\subset\Sigma_{\$}^{+}

,

P\in\Sigma^{+}

,

m=\left|P\right|

,

i\in[1..m]

,

[\ell..r]=\textup{CR}_{\mathcal{T}}(P[i+1..])

,

h=\pi(P[i..]\cdot\$)

, and

e=\textup{rank}_{\infty}(\langle P[i..]\rangle,m-i+1)

.

Proof.

Let $c=\left|\textup{CR}_{\mathcal{T}}(P[i..])\right|$ . We show how Algorithm 1 computes $c$ and $r^{\prime}$ to obtain $[\ell^{\prime}..r^{\prime}]$ .

Case 1.

Assume $e>1$ . This case is handled from Line 3 through Line 5. By Lemma 3.11, $\textup{LF}_{\mathcal{T}}[j]\in\textup{CR}_{\mathcal{T}}(P[i..])$ if and only if $\textup{L}_{\mathcal{T}}[j]=h$ for each $j\in[\ell..r]$ . We compute $c$ in Line 4 and return an empty interval in Line 13 due to Line 2 if $c\leq 0$ . Assume $c\geq 1$ . We compute the largest $x\in[\ell..r]$ such that $\textup{L}_{\mathcal{T}}[x]=h$ , and then $r^{\prime}=\textup{LF}_{\mathcal{T}}[x]$ in Line 5, with correctness following by Corollary 3.8.

Case 2.

Assume $e=1$ . This case is handled from Line 6 through Line 12. By Lemma 3.11, $\textup{LF}_{\mathcal{T}}[j]\in\textup{CR}_{\mathcal{T}}(P[i..])$ if and only if $\textup{L}_{\mathcal{T}}[j]\geq h$ for each $j\in[\ell..r]$ . Thus, we correctly compute $c$ in Line 7, and return an empty interval in Line 13 due to Line 2 if $c\leq 0$ . Assume $c\geq 1$ . We compute the lowest value $v$ in $\textup{L}_{\mathcal{T}}[\ell..r]$ greater than $h-1$ in Line 8 and determine the largest $x\in[\ell..r]$ such that $\textup{L}_{\mathcal{T}}[x]=v$ in Line 9. Let $[\ell^{\prime\prime}..r^{\prime\prime}]$ denote the interval computed in Line 10. Then $\textup{lcp}^{\infty}(C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[j]}),C_{% \mathcal{T}}({\textup{CA}_{\mathcal{T}}[k]}))\geq v+1$ for each $j,k\in[\ell^{\prime\prime}..r^{\prime\prime}]$ by Lemma 3.12. We compute $y=\textup{LF}_{\mathcal{T}}[x]-\ell^{\prime}$ . The following statements hold due to Lemma 3.7.

$\blacksquare$

If $j\in[\ell..x-1]$ satisfies $\textup{L}_{\mathcal{T}}[j]\geq h$ , then $\textup{LF}_{\mathcal{T}}[j]<\textup{LF}_{\mathcal{T}}[x]$ since $v\leq\textup{L}_{\mathcal{T}}[j]$ .
$\blacksquare$

If $j\in[x+1..r^{\prime\prime}]$ satisfies $\textup{L}_{\mathcal{T}}[j]\geq h$ , then $\textup{LF}_{\mathcal{T}}[j]<\textup{LF}_{\mathcal{T}}[x]$ since $v<\textup{L}_{\mathcal{T}}[j]$ and $v<v+1\leq\textup{lcp}^{\infty}(C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[j]})% ,C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[x]}))$ .
$\blacksquare$

If $j\in[r^{\prime\prime}+1..r]$ satisfies $\textup{L}_{\mathcal{T}}[j]\geq h$ , then $\textup{LF}_{\mathcal{T}}[j]>\textup{LF}_{\mathcal{T}}[x]$ since we have $v\geq\textup{lcp}^{\infty}(C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[j]}),C_{% \mathcal{T}}({\textup{CA}_{\mathcal{T}}[x]}))$ .

We apply these results to compute $y$ in Line 11, and then infer $r^{\prime}=\textup{LF}_{\mathcal{T}}[x]+c-(y+1)$ in Line 12.

$\hfill\blacktriangleleft$ The next result is obtained from the computation of Demaine and colleagues’ [3] Cartesian tree signature encoding of the given string.

Lemma 3.14.

Given $P\in\Sigma_{\$}^{*}$ of length $m$ , we can process $P$ in $O(m)$ time such that we can subsequently compute $\pi(P[i..]\cdot\$)$ and $\textup{rank}_{\infty}(P[i..],m-i+1)$ in $O(1)$ time, for every $i\in[1..m]$ .

Theorem 3.15.

Let $\emptyset\neq\mathcal{T}=\{T_{1},...,T_{d}\}\subset\Sigma_{\$}^{+}$ and $n=\left|T_{1}\cdots T_{d}\right|$ . There exists a data structure occupying $O(n\lg\sigma)$ bits of space that solves Count in $O(m\frac{\lg\sigma\lg n}{\lg\lg n})$ time, where $m$ is the pattern length.

Proof.

We represent $\textup{F}_{\mathcal{T}}$ , $\textup{L}_{\mathcal{T}}$ and $\textup{LCP}_{\mathcal{T}}^{\infty}$ by the data structure of Lemma 2.1, which leads to the claimed space complexity. We preprocess a pattern $P\in\Sigma^{+}$ of length $m$ with Lemma 3.14, and compute $\textup{CR}_{\mathcal{T}}(P[i..])$ from $\textup{CR}_{\mathcal{T}}(P[i+1..])$ for each $i\in[1..m]$ in descending order leveraging Lemma 3.13. Since each conjugate range update takes $O(1)$ queries, the claimed complexity for solving Count follows from Lemma 2.1. $\hfill\blacktriangleleft$ We call the data structure of Theorem 3.15 the cBWT index of $\mathcal{T}$ . The cBWT index of the running example is presented in Figure 5.

$i$	$\textup{CA}_{\mathcal{T}}[i]$	$C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})$	$\textup{LF}_{\mathcal{T}}[i]$	$\textup{F}_{\mathcal{T}}[i]$	$\textup{L}_{\mathcal{T}}[i]$	$\textup{LCP}_{\mathcal{T}}^{\infty}[i]$	$\langle C_{\mathcal{T}}({\textup{CA}_{\mathcal{T}}[i]})\rangle$
1	$\mathtt{8}$	$\mathtt{4478}$	$\mathtt{8}$	$\mathtt{1}$	$\mathtt{0}$	$\mathtt{0}$	$\mathtt{\infty 111}$
2	$\mathtt{9}$	$\mathtt{4784}$	$\mathtt{1}$	$\mathtt{2}$	$\mathtt{1}$	$\mathtt{1}$	$\mathtt{\infty 113}$
3	$\mathtt{2}$	$\mathtt{125}$	$\mathtt{9}$	$\mathtt{2}$	$\mathtt{0}$	$\mathtt{1}$	$\mathtt{\infty 11}$
4	$\mathtt{5}$	$\mathtt{3635}$	$\mathtt{10}$	$\mathtt{2}$	$\mathtt{0}$	$\mathtt{1}$	$\mathtt{\infty 121}$
5	$\mathtt{7}$	$\mathtt{3536}$	$\mathtt{11}$	$\mathtt{2}$	$\mathtt{0}$	$\mathtt{1}$	$\mathtt{\infty 121}$
6	$\mathtt{10}$	$\mathtt{7844}$	$\mathtt{2}$	$\mathtt{1}$	$\mathtt{2}$	$\mathtt{1}$	$\mathtt{\infty 1\infty 1}$
7	$\mathtt{3}$	$\mathtt{251}$	$\mathtt{3}$	$\mathtt{1}$	$\mathtt{2}$	$\mathtt{2}$	$\mathtt{\infty 1\infty}$
8	$\mathtt{11}$	$\mathtt{8447}$	$\mathtt{6}$	$\mathtt{0}$	$\mathtt{1}$	$\mathtt{1}$	$\mathtt{\infty\infty 11}$
9	$\mathtt{1}$	$\mathtt{512}$	$\mathtt{7}$	$\mathtt{0}$	$\mathtt{1}$	$\mathtt{2}$	$\mathtt{\infty\infty 1}$
10	$\mathtt{4}$	$\mathtt{5363}$	$\mathtt{4}$	$\mathtt{0}$	$\mathtt{2}$	$\mathtt{2}$	$\mathtt{\infty\infty 12}$
11	$\mathtt{6}$	$\mathtt{6353}$	$\mathtt{5}$	$\mathtt{0}$	$\mathtt{2}$	$\mathtt{2}$	$\mathtt{\infty\infty 12}$

Figure 5: The cBWT index of our running example

\mathcal{T}=\{\mathtt{512},\mathtt{5363},\mathtt{4478}\}

.

4 Construction in $O(n\lg\sigma)$ Bits of Space

We will show how to construct the cBWT index of a single text, and how an existing cBWT index can be extended to also index an additional text. We then leverage these two results to iteratively construct the cBWT index of any set of texts, adding one new text per iteration step.

4.1 Single Text cBWT Index

Before we can tackle the construction of the cBWT index of a single text, we need another technical result, which follows from an examination of the definitions and Lemma 3.7.

Lemma 4.1.

Let $V,U\in\Sigma_{\$}^{+}$ , $\textup{Rot}(V,1)\prec_{\omega}\textup{Rot}(U,1)$ , and $e=\textup{lcp}^{\infty}(\textup{Rot}(V,1),\textup{Rot}(U,1))$ . Then

\textup{lcp}^{\infty}(V,U)=\begin{cases*}0&\text{if $\pi(U)=\$$ or $\pi(V)=\$$% ,}\\ e-\pi(V)+1&\text{if $V\prec_{\omega}U$ and $\$\neq\pi(V)=\pi(U)<e$,}\\ 1&\text{otherwise.}\end{cases*}

We will first show how to construct the cBWT index of a text from $\Sigma_{\$}^{+}$ in which $\$$ occurs exactly once, and then show how to construct the cBWT index of an arbitrary text from $\Sigma^{+}$ (without the requirement on $\$$ ).

Lemma 4.2.

Let $R\in\Sigma_{\$}^{+}$ , $\rho=\left|R\right|\geq 2$ , $R[\rho]=\$$ , and $\textup{rank}_{\$}(R,\rho)=1$ . Algorithm 2 correctly computes $\textup{CA}_{\{bR\}}^{-1}[1]$ and the cBWT index of $\{bR\}$ for each $b\in\Sigma$ .

Algorithm 2 Computing

c+1=\textup{CA}_{\{bR\}}^{-1}[1]

and updating the cBWT of

\{R\}

to that of

\{bR\}

for

b\in\Sigma

. Here,

R\in\Sigma_{\$}^{+}

,

\rho=\left|R\right|

,

R[\rho]=\$

,

\textup{rank}_{\$}(R,\rho)=1

, and

y=\textup{CA}_{\{R\}}^{-1}[1]

.

Proof.

For each $i\in[0..\pi(bR)]$ , let $J_{i}=[\ell_{i}..r_{i}]$ maximal such that $r_{i}\leq\textup{CA}_{\{R\}}^{-1}[1]$ and $\textup{lcp}^{\infty}(R,C_{\{R\}}({\textup{CA}_{\{R\}}[j]}))=i$ for each $j\in J_{i}$ , and $J_{\pi(bR)+1}=[\ell_{\pi(bR)+1}..r_{\pi(bR)+1}]$ maximal such that $\textup{lcp}^{\infty}(R,C_{\{R\}}({\textup{CA}_{\{R\}}[j]}))\geq\pi(bR)+1$ for each $j\in J_{\pi(bR)+1}$ , i.e., the left and right boundary of the former are computed in Line 6 and Line 5, respectively, and the latter in Line 2. The following statements are due to the location of the single $\$$ in each conjugate of $R$ and Lemma 3.7.

$\blacksquare$

If $j\in[r_{\pi(bR)+1}+1..\rho]$ , then $bR\prec_{\omega}C_{\{R\}}({\textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[j]]})$ .
$\blacksquare$

If $j\in[\textup{CA}_{\{R\}}^{-1}[1]+1..r_{\pi(bR)+1}]$ , then $C_{\{R\}}({\textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[j]]})\prec_{\omega}bR$ if and only if $\textup{L}_{\{R\}}[j]\geq\pi(bR)+1$ .
$\blacksquare$

If $j=\textup{CA}_{\{R\}}^{-1}[1]$ , then $C_{\{R\}}({\textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[j]]})=(\$\cdot R)[..\rho]% \prec_{\omega}bR$ .
$\blacksquare$

If $j\in[\ell_{\pi(bR)+1}..\textup{CA}_{\{R\}}^{-1}[1]-1]$ , then $C_{\{R\}}({\textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[j]]})\prec_{\omega}bR$ if and only if $\textup{L}_{\{R\}}[j]\geq\pi(bR)$ .
$\blacksquare$

If $i\in[0..\pi(bR)]$ and $j\in J_{i}$ , then $C_{\{R\}}({\textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[j]]})\prec_{\omega}bR$ if and only if $\textup{L}_{\{R\}}[j]\geq i$ .

We apply these results from Line 2 through Line 6 to compute the number $c$ of conjugates of $R$ smaller than $b R$ according to $\omega$ -preorder. Since $C_{\{R\}}({k})[..\textup{select}_{\$}(C_{\{R\}}({k}),1)]=C_{\{bR\}}({k+1})[..% \textup{select}_{\$}(C_{\{R\}}({k}),1)]$ for each $k\in[1..\rho]$ by assumption on $R$ and $b$ ,

C_{\{bR\}}({\textup{CA}_{\{bR\}}[i]})[..x]=\begin{cases}C_{\{R\}}({\textup{CA}% _{\{R\}}[i]})[..x]&\text{if $i\in[1..c]$,}\\ bR&\text{if $i=c+1$,}\\ C_{\{R\}}({\textup{CA}_{\{R\}}[i-1]})[..x]&\text{otherwise,}\end{cases}

for each $i\in[1..\rho+1]$ with $x=\textup{select}_{\$}(C_{\{bR\}}({\textup{CA}_{\{bR\}}[i]}),1)$ . Thus, $c$ is also the number of conjugates of $b R$ smaller than $b R$ according to $\omega$ -preorder. Subsequently, Algorithm 2 updates the cBWT index of $\{R\}$ from Line 8 through Line 17. By assumption on $b$ and $R$ , $\textup{LCP}_{\{R\}}^{\infty}[..c]=\textup{LCP}_{\{bR\}}^{\infty}[..c]$ and $\textup{LCP}_{\{R\}}^{\infty}[c+2..]=\textup{LCP}_{\{bR\}}^{\infty}[c+3..]$ . From Line 8 through Line 14 we leverage Lemma 3.12 and Lemma 4.1 to compute $\textup{LCP}_{\{bR\}}^{\infty}[c+1]$ and, if $c<\rho$ , $\textup{LCP}_{\{bR\}}^{\infty}[c+2]$ , and update $\textup{LCP}_{\{R\}}^{\infty}$ . By assumption on $R$ , $\pi(C_{\{R\}}({j}))=\pi(C_{\{bR\}}({j+1}))$ for each $j\in[1..\rho]$ . Consequently, $\textup{F}_{\{R\}}[..c]=\textup{F}_{\{bR\}}[..c]$ and $\textup{F}_{\{R\}}[c+1..]=\textup{F}_{\{bR\}}[c+2..]$ . Moreover, if we set $\textup{L}_{\{R\}}[\textup{CA}_{\{R\}}^{-1}[1]]=\pi(bR)$ , then $\textup{L}_{\{R\}}[..c]=\textup{L}_{\{bR\}}[..c]$ and $\textup{L}_{\{R\}}[c+1..]=\textup{L}_{\{bR\}}[c+2..]$ . It remains to compute $\textup{F}_{\{bR\}}[c+1]$ and $\textup{L}_{\{bR\}}[c+1]$ , which are $\pi(bR)$ and $\$$ , respectively. The update of both $\textup{F}_{\{R\}}$ and $\textup{L}_{\{R\}}$ is done from Line 15 through Line 17. $\hfill\blacktriangleleft$

Corollary 4.3.

Let $R\in\Sigma_{\$}^{+}$ , $\rho=\left|R\right|$ , $R[\rho]=\$$ , and $\textup{rank}_{\$}(R,\rho)=1$ . The cBWT index of $\{R\}$ can be constructed in $O(\rho\lg\sigma)$ bits of space and $O(\rho\frac{\lg\sigma\lg\rho}{\lg\lg\rho})$ time.

Proof.

We show the case where $\rho\geq 2$ . We represent $\textup{F}_{\{R[\rho-1..]\}}=\mathtt{\$0}$ , $\textup{L}_{\{R[\rho-1..]\}}=\mathtt{0\$}$ and $\textup{LCP}_{\{R[\rho-1..]\}}^{\infty}=\mathtt{00}$ by the data structure of Lemma 2.1. Initially, $\textup{CA}_{\{R[\rho-1..]\}}^{-1}[1]=2$ . We preprocess $R$ with Lemma 3.14 in $O(\rho)$ time to have access to $\pi(R[i..])=\pi(\textup{Rot}(R,i-1))$ for each $i\in[1..\rho]$ in $O(1)$ time. For each $i\in[1..\rho-2]$ in descending order, we apply Algorithm 2 to compute $\textup{CA}_{\{R[i..]\}}^{-1}[1]$ and extend the cBWT index of $\{R[i+1..]\}$ to that of $\{R[i..]\}$ . Since the extension of the cBWT index of $\{R[i+1..]\}$ to that of $\{R[i..]\}$ takes $O(\pi(R[i..]))$ queries for each $i\in[1..\rho-2]$ , the construction of the cBWT index of $\{R[1..]\}=\{R\}$ takes a total of $O(\rho)$ queries by Lemma 3.6. The claimed complexities then follow by Lemma 2.1. $\hfill\blacktriangleleft$

$\blacktriangleright$ Remark 4.4.

Due to Remark 3.10, the previous statement yields the integer-based representation of Kim and Cho’s index [17, Section 4] by substituting $\$$ with $-1$ in both $\textup{F}_{\mathcal{T}}$ and $\textup{L}_{\mathcal{T}}$ . Thus, we can construct their compact index [17, Section 5] directly [17, Section A.4] while retaining the complexities as stated in Corollary 4.3.

Corollary 4.5.

Let $T\in\Sigma^{+}$ and $\left|T\right|=n$ . Then the cBWT index of $\{T\}$ can be constructed in $O(n\lg\sigma)$ bits of space and $O(n\frac{\lg\sigma\lg n}{\lg\lg n})$ time.

Proof.

Let $R=T^{4}\$$ . We construct the cBWT index of $\{R\}$ within the claimed complexities by Corollary 4.3. During construction, we build a bit string $Y$ of length $4n+1$ such that $Y[i]=1$ if and only if $\textup{CA}_{\{R\}}[i]\in[2..n+1]$ . By choice of $R$ , $C_{\{T\}}({i})^{\omega}[..3n]=C_{\{R\}}({i})[..3n]$ for each $i\in[1..n]$ , and $C_{\{T\}}({1})^{\omega}[..3n]=C_{\{R\}}({n+1})[..3n]$ . The following statements hold for each $i\in[1..n]$ by choice of $Y$ and Lemma 3.3.

$\blacksquare$

$C_{\{T\}}({\textup{CA}_{\{T\}}[i]})^{\omega}[..3n]=C_{\{R\}}({\textup{CA}_{\{R% \}}[\textup{select}_{1}(Y,i)]})[..3n]$ .
$\blacksquare$

$\pi(C_{\{T\}}({\textup{CA}_{\{T\}}[i]}))=\pi(C_{\{R\}}({\textup{CA}_{\{R\}}[% \textup{select}_{1}(Y,i)]}))$ .
$\blacksquare$

$\pi(C_{\{T\}}({\textup{CA}_{\{T\}}[\textup{LF}_{\{T\}}[i]]}))=\pi(C_{\{R\}}({% \textup{CA}_{\{R\}}[\textup{LF}_{\{R\}}[\textup{select}_{1}(Y,i)]]}))$ .

Thus, $\textup{F}_{\{T\}}[i]=\textup{F}_{\{R\}}[\textup{select}_{1}(Y,i)]$ and $\textup{L}_{\{T\}}[i]=\textup{L}_{\{R\}}[\textup{select}_{1}(Y,i)]$ for each $i\in[1..n]$ . Moreover, $\textup{LCP}_{\{T\}}^{\infty}[i]=\textup{RNV}_{\textup{LCP}_{\{R\}}^{\infty}}(% \textup{select}_{1}(Y,i-1)+1,\textup{select}_{1}(Y,i),-1)$ for each $i\in[2..n]$ due to Lemma 3.12, and $\textup{LCP}_{\{T\}}^{\infty}[1]=0$ . Hence, we can transform the cBWT index of $\{R\}$ to index $\{T\}$ with $O(n)$ queries and operations. The claimed complexities now follow by Lemma 2.1. $\hfill\blacktriangleleft$

4.2 Extending an Existing cBWT Index

Let $\emptyset\neq\mathcal{R}\subset\Sigma^{+}$ be a non-empty set of strings and $\rho=\left|\textup{CA}_{\mathcal{R}}\right|$ denote the accumulated length of all strings in $\mathcal{R}$ . In this subsection, we assume we have the cBWT index of $\mathcal{R}$ at hand, and want to extend it by another text $S\in\Sigma^{+}$ of length $\lambda$ to index $\mathcal{S}=\mathcal{R}\cup\{S\}$ . To achieve this, we compute the integers $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ for each $i\in[1..\lambda]$ , whose definitions follow. Let $\textup{cnt}_{\mathcal{R}}(V)=\left|\{i\in[1..\rho]\mid C_{\mathcal{R}}({% \textup{CA}_{\mathcal{R}}[i]})\preceq_{\omega}V\}\right|$ ,

\begin{split}\textup{plcp}^{\infty}_{\mathcal{R}}(V)&=\begin{cases*}-1&\text{% if $\textup{cnt}_{\mathcal{R}}(V)=0$,}\\ \textup{lcp}^{\infty}(C_{\mathcal{R}}({\textup{CA}_{\mathcal{R}}[\textup{cnt}_% {\mathcal{R}}(V)]}),V)&\text{otherwise, and}\end{cases*}\\ \textup{slcp}^{\infty}_{\mathcal{R}}(V)&=\begin{cases*}\textup{lcp}^{\infty}(C% _{\mathcal{R}}({\textup{CA}_{\mathcal{R}}[\textup{cnt}_{\mathcal{R}}(V)+1]}),V% )&\text{if $\textup{cnt}_{\mathcal{R}}(V)<\rho$,}\\ -1&\text{otherwise,}\end{cases*}\end{split}

for each $V\in\Sigma_{\$}^{+}$ . We can apply the techniques exhibited in Lemma 4.2 to compute the helper values.

Algorithm 3 Computing

c=\textup{cnt}_{\mathcal{R}}(V)

,

p=\textup{plcp}^{\infty}_{\mathcal{R}}(V)

and

s=\textup{slcp}^{\infty}_{\mathcal{R}}(V)

. Here,

\emptyset\neq\mathcal{R}\subset\Sigma^{+}

,

\rho=\left|\textup{CA}_{\mathcal{R}}\right|

,

V\in\Sigma_{\$}^{+}

,

\textup{rank}_{\infty}(V,\left|V\right|)\geq 1

, and

y=\textup{cnt}_{\mathcal{R}}(\textup{Rot}(V,1))

.

Lemma 4.6.

Let $\emptyset\neq\mathcal{R}\subset\Sigma^{+}$ , $\rho=\left|\textup{CA}_{\mathcal{R}}\right|$ , and $V\in\Sigma_{\$}^{+}$ with $\textup{rank}_{\infty}(V,\left|V\right|)\geq 1$ . Then Algorithm 3 correctly computes $\textup{cnt}_{\mathcal{R}}(V)$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(V)$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(V)$ .

Proof.

Algorithm 3 takes $\pi(V)$ , $y=\textup{cnt}_{\mathcal{R}}(\textup{Rot}(V,1))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,1))$ , $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,1))$ , and the cBWT index of $\mathcal{R}$ as input. If $\pi(V)=\$$ , then we return $\textup{cnt}_{\mathcal{R}}(V)=0$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(V)=-1$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(V)=0$ in Line 2 since $\mathcal{R}\subset\Sigma^{+}$ . Thus, assume $\pi(V)>\$$ . For each $i\in[0..\pi(V)]$ , let $J_{i}=[\ell_{i}..r_{i}]$ maximal such that $r_{i}\leq y$ and $\textup{lcp}^{\infty}(V,C_{\mathcal{R}}({\textup{CA}_{\mathcal{R}}[j]}))=i$ for each $j\in J_{i}$ . The following statements are due to Lemma 3.7 and $\mathcal{R}\subset\Sigma^{+}$ .

$\blacksquare$

If $j\in[y+1..\rho]$ , then $C_{\mathcal{R}}({\textup{CA}_{\mathcal{T}}[\textup{LF}_{\mathcal{R}}[j]]})% \prec_{\omega}V$ if and only if $\textup{L}_{\mathcal{R}}[j]\geq\pi(V)+1$ and $\textup{lcp}^{\infty}(C_{\mathcal{R}}({\textup{CA}_{\mathcal{T}}[i]}),\textup{% Rot}(V,j))\geq\pi(V)+1$ .
$\blacksquare$

If $j\in[r_{\pi(V)}+1..y]$ , then $C_{\mathcal{R}}({\textup{CA}_{\mathcal{T}}[\textup{LF}_{\mathcal{R}}[j]]})% \prec_{\omega}V$ if and only if $\textup{L}_{\mathcal{R}}[j]\geq\pi(V)$ .
$\blacksquare$

If $i\in[0..\pi(V)]$ and $j\in J_{i}$ , then $C_{\mathcal{R}}({\textup{CA}_{\mathcal{T}}[\textup{LF}_{\mathcal{R}}[j]]})% \prec_{\omega}V$ if and only if $\textup{L}_{\mathcal{T}}[j]\geq i$ .

We apply these results from Line 3 through Line 10 to compute $\textup{cnt}_{\mathcal{R}}(V)$ . Leveraging Lemma 4.1 and (the proof of) Lemma 3.12, we then compute $\textup{plcp}^{\infty}_{\mathcal{R}}(V)$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(V)$ from Line 11 through Line 16 and from Line 17 through Line 23, respectively. $\hfill\blacktriangleleft$

Lemma 4.7.

Let $\emptyset\neq\mathcal{R}\subset\Sigma^{+}$ , $\rho=\left|\textup{CA}_{\mathcal{R}}\right|$ , $V\in\Sigma_{\$}^{+}$ , and $\textup{rank}_{\infty}(V,\left|V\right|)\geq 1$ . If $\pi(V)\neq\$$ , then $\textup{cnt}_{\mathcal{R}}(V)$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(V)$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(V)$ can be computed from $\pi(V)$ , $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(V,1))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,1))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,1))$ in $O((1+\pi(V))\frac{\lg\sigma\lg\rho}{\lg\lg\rho})$ time with the cBWT index of $\mathcal{R}$ .

Proof.

We apply Algorithm 3. Correctness follows from Lemma 4.6, and the time complexity is due to the loop of Algorithm 3 in Line 7 and Lemma 2.1. $\hfill\blacktriangleleft$

For the iterative construction, we augment the cBWT index by a dynamic bit string $E$ with the invariant that $E$ is zeroed before and after each extension with a new string. We use $E$ to temporarily mark modified parts in the cBWT such that we retain the functionality of the index even during construction. The length of $E$ is the number $\rho$ of characters indexed by the cBWT index, which we call in the next lemma the augmented cBWT index for clarity.

Lemma 4.8.

Let $\emptyset\neq\mathcal{R}\subset\Sigma^{+}$ , $\rho=\left|\textup{CA}_{\mathcal{R}}\right|$ , $S\in\Sigma^{+}$ , $\mathcal{S}=\mathcal{R}\cup\{S\}$ , $\lambda=\left|S\right|$ , and $z=\max\{\left|V\right|\mid V\in\mathcal{S}\}$ . The augmented cBWT index of $\mathcal{R}$ can be extended to index $\mathcal{S}$ in $O((\lambda+\rho)\lg\sigma)$ bits of space and $O(z\frac{\lg\sigma\lg(\lambda+\rho)}{\lg\lg(\lambda+\rho)})$ time.

Proof.

We compute the cBWT index of $\{S\}$ within the claimed complexities by Corollary 4.5. Then we compute $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ for each $i\in[1..\lambda]$ .

Case 1.: Assume $\textup{Rot}(S,i)=_{\omega}R$ for some $R\in\mathcal{R}$ and $i\in[1..\lambda]$ . Then $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))=r$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))=\textup{rank}_{\infty}% (\langle\textup{Rot}(S,i)\rangle,\lambda)$ , and, if $r<\rho$ , $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))=\textup{LCP}_{\mathcal% {R}}^{\infty}[r+1]$ by Lemma 3.3, where $i\in[1..\lambda]$ and $r$ is the right boundary of $\textup{CR}_{\mathcal{R}}(\textup{Rot}(S,i)^{\omega}[..3z])$ . Since $\textup{rank}_{\infty}(\langle\textup{Rot}(S,i-1)\rangle,\lambda)=\textup{rank% }_{\infty}(\langle S^{\omega}[i..4z]\rangle,4z-i+1)$ for each $i\in[2..\lambda+1]$ , the last $\lambda$ steps of the backward search for $S^{\omega}[2..4z]$ yield the necessary values to compute $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ for all $i\in[1..\lambda]$ in descending order within the claimed complexities by Theorem 3.15.
Case 2.: Assume $\textup{Rot}(S,i)\not=_{\omega}R$ for each $R\in\mathcal{R}$ and $i\in[1..\lambda]$ . Let $V=S^{\omega}[..4z]\cdot\$$ . By assumption and Corollary 3.4, $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))=\textup{cnt}_{\mathcal{R}}(% \textup{Rot}(V,i))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))=\textup{plcp}^{\infty}% _{\mathcal{R}}(\textup{Rot}(V,i))$ , and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))=\textup{slcp}^{\infty}% _{\mathcal{R}}(\textup{Rot}(V,i))$ for each $i\in[1..\lambda]$ . We leverage Lemma 3.14 to preprocess $V$ within the claimed complexities. Since $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(V,4z))=0$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,4z))=-1$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,4z))=0$ by construction, we can now use Lemma 4.7 to compute $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(V,j))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,j))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(V,j))$ for each $j\in[1..4z-1]$ in descending order. Hence, we obtain $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))$ , $\textup{plcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ and $\textup{slcp}^{\infty}_{\mathcal{R}}(\textup{Rot}(S,i))$ for all $i\in[1..\lambda]$ in descending order within the claimed complexities by Lemma 3.6.

Leveraging the backward search on the cBWT index of $\{S\}$ , we store the $\textup{plcp}_{\mathcal{R}}^{\infty}$ -values and $\textup{slcp}_{\mathcal{R}}^{\infty}$ -values in lex-order, which takes $O(\lambda\lg\sigma)$ bits of space. To store the $\textup{cnt}_{\mathcal{R}}$ -values in lex-order and stay within the claimed time and space complexity, we utilize $E$ , which we assume to be represented by the data structure of Lemma 2.1. For each $i\in[1..\lambda]$ in descending order, we call $\textup{insert}_{E}(\textup{select}_{0}(E,\textup{cnt}_{\mathcal{R}}(\textup{% Rot}(S,i)))+1,1)$ , and discard $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i))$ once $\textup{cnt}_{\mathcal{R}}(\textup{Rot}(S,i-1))$ has been computed. Then $\textup{cnt}_{\mathcal{R}}(C_{\{S\}}({\textup{CA}_{\{S\}}[i]}))=\textup{rank}_% {0}(E,\textup{select}_{1}(E,i))$ . The following statements for each $i\in[1..\lambda+\rho]$ are due to construction.

$\blacksquare$

If $E[i]=0$ , then $\textup{F}_{\mathcal{S}}[i]=\textup{F}_{\mathcal{R}}[\textup{rank}_{0}(E,i)]$ and $\textup{L}_{\mathcal{S}}[i]=\textup{L}_{\mathcal{R}}[\textup{rank}_{0}(E,i)]$ .
$\blacksquare$

If $E[i]=1$ , then $\textup{F}_{\mathcal{S}}[i]=\textup{F}_{\{S\}}[\textup{rank}_{1}(E,i)]$ and $\textup{L}_{\mathcal{S}}[i]=\textup{L}_{\{S\}}[\textup{rank}_{1}(E,i)]$ .
$\blacksquare$

$\textup{LCP}_{\mathcal{S}}^{\infty}[1]=0$ .
$\blacksquare$

If $i\geq 2$ , $E[i-1]=1$ and $E[i]=1$ , then $\textup{LCP}_{\mathcal{S}}^{\infty}[i]=\textup{LCP}_{\{S\}}^{\infty}[\textup{% rank}_{1}(E,i)]$ .
$\blacksquare$

If $i\geq 2$ , $E[i-1]=1$ and $E[i]=0$ , then $\textup{LCP}_{\mathcal{S}}^{\infty}[i]=\textup{slcp}^{\infty}_{\mathcal{R}}(C_% {\{S\}}({\textup{CA}_{\{S\}}[\textup{rank}_{1}(E,i)]}))$ .
$\blacksquare$

If $i\geq 2$ , $E[i-1]=0$ and $E[i]=1$ , then $\textup{LCP}_{\mathcal{S}}^{\infty}[i]=\textup{plcp}^{\infty}_{\mathcal{R}}(C_% {\{S\}}({\textup{CA}_{\{S\}}[\textup{rank}_{1}(E,i)]}))$ .
$\blacksquare$

If $i\geq 2$ , $E[i-1]=0$ and $E[i]=0$ , then $\textup{LCP}_{\mathcal{S}}^{\infty}[i]=\textup{LCP}_{\mathcal{R}}^{\infty}[% \textup{rank}_{0}(E,i)]$ .

Consequently, $O(\lambda)$ queries and operations are needed to update the cBWT index of $\mathcal{R}$ to index $\mathcal{S}$ . Finally, we zero $E$ . The claimed complexities then follow from Lemma 2.1. $\hfill\blacktriangleleft$

We are finally able to state the main result of this section.

Theorem 4.9.

Let $\emptyset\neq\mathcal{T}=\{T_{1},...,T_{d}\}\subset\Sigma^{+}$ and $n=\left|T_{1}\cdots T_{d}\right|$ . The cBWT index of $\mathcal{T}$ can be constructed in $O(n\lg\sigma)$ bits of space and $O(n\frac{\lg\sigma\lg n}{\lg\lg n})$ time.

Proof.

Let $\left|T_{k-1}\right|\leq\left|T_{k}\right|$ for each $k\in[2..d]$ , and let $E_{j}$ denote a zeroed bit string of length $\left|T_{1}\cdots T_{j}\right|$ for each $j\in[1..d]$ . First, we construct $E_{1}$ and the cBWT index of $\{T_{1}\}$ within the claimed complexities by Lemma 4.5, where each string is represented by the data structure of Lemma 2.1. Subsequently, we iteratively extend the cBWT index of $\{T_{1},...,T_{k-1}\}$ augmented by $E_{k-1}$ to the cBWT index of $\{T_{1},...,T_{k}\}$ augmented by $E_{k}$ for each $k\in[2..d]$ in ascending order leveraging Lemma 4.8. Then the cBWT index of $\mathcal{T}$ is constructed in $O((n+\left|T_{d}\right|)\lg\sigma)=O(n\lg\sigma)$ bits of space and $O((\sum_{k=1}^{d}\left|T_{k}\right|)\frac{\lg\sigma\lg n}{\lg\lg n})=O(n\frac{% \lg\sigma\lg n}{\lg\lg n})$ time. $\hfill\blacktriangleleft$

References

[1] Bastien Auvray, Julien David, Richard Groult, and Thierry Lecroq. Approximate cartesian tree matching: An approach using swaps. In Proc. SPIRE, volume 14240 of LNCS, pages 49–61, 2023. doi:10.1007/978-3-031-43980-3_5.
[2] Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, and Marinella Sciortino. r-indexing the eBWT. In Proc. SPIRE, volume 12944 of LNCS, pages 3–12, 2021. doi:10.1007/978-3-030-86692-1_1.
[3] Erik D. Demaine, Gad M. Landau, and Oren Weimann. On Cartesian trees and range minimum queries. Algorithmica, 68(3):610–625, 2014. doi:10.1007/s00453-012-9683-x.
[4] Simone Faro, Thierry Lecroq, Kunsoo Park, and Stefano Scafiti. On the longest common Cartesian substring problem. The Computer Journal, 66(4):907–923, 2022. doi:10.1093/COMJNL/BXAB204.
[5] Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proc. FOCS, pages 390–398, 2000. doi:10.1109/SFCS.2000.892127.
[6] Johannes Fischer. Optimal succinctness for range minimum queries. In Proc. LATIN, volume 6034 of LNCS, pages 158–169, 2010. doi:10.1007/978-3-642-12200-2_16.
[7] Peter Foster, Anssi Klapuri, and Simon Dixon. A method for identifying repetition structure in musical audio based on time series prediction. In Proc. EUSIPCO, pages 1299–1303. IEEE, 2012. URL: https://ieeexplore.ieee.org/document/6334323/.
[8] Tak-Chung Fu, Korris Fu-Lai Chung, Robert Wing Pong Luk, and Chak-man Ng. Stock time series pattern matching: Template-based vs. rule-based approaches. Eng. Appl. Artif. Intell., 20(3):347–364, 2007. doi:10.1016/J.ENGAPPAI.2006.07.003.
[9] Mitsuru Funakoshi, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing maximal palindromes in non-standard matching models. In Proc. IWOCA, volume 14764 of LNCS, pages 165–179, 2024. doi:10.1007/978-3-031-63021-7_13.
[10] Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. On indeterminate strings matching. In Proc. CPM, volume 161 of LIPIcs, pages 14:1–14:14, 2020. doi:10.4230/LIPICS.CPM.2020.14.
[11] Daiki Hashimoto, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Computing the parameterized Burrows–Wheeler transform online. In Proc. SPIRE, volume 13617 of LNCS, pages 70–85, 2022. doi:10.1007/978-3-031-20643-6_6.
[12] Yuzuru Hiraga. Structural recognition of music by pattern matching. In Proc. ICMC. Michigan Publishing, 1997. URL: https://hdl.handle.net/2027/spo.bbp2372.1997.113.
[13] Wing-Kai Hon, Tsung-Han Ku, Chen-Hua Lu, Rahul Shah, and Sharma V. Thankachan. Efficient algorithm for circular Burrows–Wheeler transform. In Proc. CPM, volume 7354 of LNCS, pages 257–268, 2012. doi:10.1007/978-3-642-31265-6_21.
[14] Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a barrier in constructing compact indexes for parameterized pattern matching. In Proc. ICALP, volume 297 of LIPIcs, pages 89:1–89:19, 2024. doi:10.4230/LIPICS.ICALP.2024.89.
[15] Natsumi Kikuchi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Computing covers under substring consistent equivalence relations. In Proc. SPIRE, volume 12303 of LNCS, pages 131–146, 2020. doi:10.1007/978-3-030-59212-7_10.
[16] Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theor. Comput. Sci., 525:68–79, 2014. doi:10.1016/J.TCS.2013.10.006.
[17] Sung-Hwan Kim and Hwan-Gue Cho. A compact index for Cartesian tree matching. In Proc. CPM, volume 191 of LIPIcs, pages 18:1–18:19, 2021. doi:10.4230/LIPIcs.CPM.2021.18.
[18] Sungmin Kim and Yo-Sub Han. Approximate Cartesian tree pattern matching. In Proc. DLT, volume 14791 of LNCS, pages 189–202, 2024. doi:10.1007/978-3-031-66159-4_14.
[19] Marcin Kubica, Tomasz Kulczyński, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett., 113(12):430–433, 2013. doi:10.1016/J.IPL.2013.03.015.
[20] Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. An extension of the Burrows–Wheeler transform. Theor. Comput. Sci., 387(3):298–312, 2007. doi:10.1016/j.tcs.2007.07.014.
[21] Yoshiaki Matsuoka, Takahiro Aoki, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Generalized pattern matching and periodicity under substring consistent equivalence relations. Theor. Comput. Sci., 656:225–233, 2016. doi:10.1016/j.tcs.2016.02.017.
[22] Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1–16:39, 2014. doi:10.1145/2601073.
[23] Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, and Shunsuke Inenaga. Position heaps for Cartesian-tree matching on strings and tries. In Proc. SPIRE, volume 12944 of LNCS, pages 241–254, 2021. doi:10.1007/978-3-030-86692-1_20.
[24] Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, and Hiroki Arimura. Cartesian tree subsequence matching. In Proc. CPM, volume 223 of LIPIcs, pages 14:1–14:18, 2022. doi:10.4230/LIPICS.CPM.2022.14.
[25] Eric M. Osterkamp and Dominik Köppl. Extending the parameterized Burrows–Wheeler transform. In Proc. DCC, pages 143–152, 2024. doi:10.1109/DCC58796.2024.00022.
[26] Sung Gwan Park, Magsarjav Bataa, Amihood Amir, Gad M. Landau, and Kunsoo Park. Finding patterns and periods in Cartesian tree matching. Theor. Comput. Sci., 845:181–197, 2020. doi:10.1016/J.TCS.2020.09.014.
[27] Siwoo Song, Geonmo Gu, Cheol Ryu, Simone Faro, Thierry Lecroq, and Kunsoo Park. Fast algorithms for single and multiple pattern Cartesian tree matching. Theor. Comput. Sci., 849:47–63, 2021. doi:10.1016/J.TCS.2020.10.009.
[28] Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229–239, 1980. doi:10.1145/358841.358852.

[bib.bib1] [1] Bastien Auvray, Julien David, Richard Groult, and Thierry Lecroq. Approximate cartesian tree matching: An approach using swaps. In Proc. SPIRE, volume 14240 of LNCS, pages 49–61, 2023. doi:10.1007/978-3-031-43980-3_5.

[bib.bib2] [2] Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, and Marinella Sciortino. r-indexing the eBWT. In Proc. SPIRE, volume 12944 of LNCS, pages 3–12, 2021. doi:10.1007/978-3-030-86692-1_1.

[bib.bib3] [3] Erik D. Demaine, Gad M. Landau, and Oren Weimann. On Cartesian trees and range minimum queries. Algorithmica, 68(3):610–625, 2014. doi:10.1007/s00453-012-9683-x.

[bib.bib4] [4] Simone Faro, Thierry Lecroq, Kunsoo Park, and Stefano Scafiti. On the longest common Cartesian substring problem. The Computer Journal, 66(4):907–923, 2022. doi:10.1093/COMJNL/BXAB204.

[bib.bib5] [5] Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proc. FOCS, pages 390–398, 2000. doi:10.1109/SFCS.2000.892127.

[bib.bib6] [6] Johannes Fischer. Optimal succinctness for range minimum queries. In Proc. LATIN, volume 6034 of LNCS, pages 158–169, 2010. doi:10.1007/978-3-642-12200-2_16.

[bib.bib7] [7] Peter Foster, Anssi Klapuri, and Simon Dixon. A method for identifying repetition structure in musical audio based on time series prediction. In Proc. EUSIPCO, pages 1299–1303. IEEE, 2012. URL: https://ieeexplore.ieee.org/document/6334323/.

[bib.bib8] [8] Tak-Chung Fu, Korris Fu-Lai Chung, Robert Wing Pong Luk, and Chak-man Ng. Stock time series pattern matching: Template-based vs. rule-based approaches. Eng. Appl. Artif. Intell., 20(3):347–364, 2007. doi:10.1016/J.ENGAPPAI.2006.07.003.

[bib.bib9] [9] Mitsuru Funakoshi, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing maximal palindromes in non-standard matching models. In Proc. IWOCA, volume 14764 of LNCS, pages 165–179, 2024. doi:10.1007/978-3-031-63021-7_13.

[bib.bib10] [10] Paweł Gawrychowski, Samah Ghazawi, and Gad M. Landau. On indeterminate strings matching. In Proc. CPM, volume 161 of LIPIcs, pages 14:1–14:14, 2020. doi:10.4230/LIPICS.CPM.2020.14.

[bib.bib11] [11] Daiki Hashimoto, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Computing the parameterized Burrows–Wheeler transform online. In Proc. SPIRE, volume 13617 of LNCS, pages 70–85, 2022. doi:10.1007/978-3-031-20643-6_6.

[bib.bib12] [12] Yuzuru Hiraga. Structural recognition of music by pattern matching. In Proc. ICMC. Michigan Publishing, 1997. URL: https://hdl.handle.net/2027/spo.bbp2372.1997.113.

[bib.bib13] [13] Wing-Kai Hon, Tsung-Han Ku, Chen-Hua Lu, Rahul Shah, and Sharma V. Thankachan. Efficient algorithm for circular Burrows–Wheeler transform. In Proc. CPM, volume 7354 of LNCS, pages 257–268, 2012. doi:10.1007/978-3-642-31265-6_21.

[bib.bib14] [14] Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, and Ayumi Shinohara. Breaking a barrier in constructing compact indexes for parameterized pattern matching. In Proc. ICALP, volume 297 of LIPIcs, pages 89:1–89:19, 2024. doi:10.4230/LIPICS.ICALP.2024.89.

[bib.bib15] [15] Natsumi Kikuchi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Computing covers under substring consistent equivalence relations. In Proc. SPIRE, volume 12303 of LNCS, pages 131–146, 2020. doi:10.1007/978-3-030-59212-7_10.

[bib.bib16] [16] Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theor. Comput. Sci., 525:68–79, 2014. doi:10.1016/J.TCS.2013.10.006.

[bib.bib17] [17] Sung-Hwan Kim and Hwan-Gue Cho. A compact index for Cartesian tree matching. In Proc. CPM, volume 191 of LIPIcs, pages 18:1–18:19, 2021. doi:10.4230/LIPIcs.CPM.2021.18.

[bib.bib18] [18] Sungmin Kim and Yo-Sub Han. Approximate Cartesian tree pattern matching. In Proc. DLT, volume 14791 of LNCS, pages 189–202, 2024. doi:10.1007/978-3-031-66159-4_14.

[bib.bib19] [19] Marcin Kubica, Tomasz Kulczyński, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett., 113(12):430–433, 2013. doi:10.1016/J.IPL.2013.03.015.

[bib.bib20] [20] Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. An extension of the Burrows–Wheeler transform. Theor. Comput. Sci., 387(3):298–312, 2007. doi:10.1016/j.tcs.2007.07.014.

[bib.bib21] [21] Yoshiaki Matsuoka, Takahiro Aoki, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Generalized pattern matching and periodicity under substring consistent equivalence relations. Theor. Comput. Sci., 656:225–233, 2016. doi:10.1016/j.tcs.2016.02.017.

[bib.bib22] [22] Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1–16:39, 2014. doi:10.1145/2601073.

[bib.bib23] [23] Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, and Shunsuke Inenaga. Position heaps for Cartesian-tree matching on strings and tries. In Proc. SPIRE, volume 12944 of LNCS, pages 241–254, 2021. doi:10.1007/978-3-030-86692-1_20.

[bib.bib24] [24] Tsubasa Oizumi, Takeshi Kai, Takuya Mieno, Shunsuke Inenaga, and Hiroki Arimura. Cartesian tree subsequence matching. In Proc. CPM, volume 223 of LIPIcs, pages 14:1–14:18, 2022. doi:10.4230/LIPICS.CPM.2022.14.

[bib.bib25] [25] Eric M. Osterkamp and Dominik Köppl. Extending the parameterized Burrows–Wheeler transform. In Proc. DCC, pages 143–152, 2024. doi:10.1109/DCC58796.2024.00022.

[bib.bib26] [26] Sung Gwan Park, Magsarjav Bataa, Amihood Amir, Gad M. Landau, and Kunsoo Park. Finding patterns and periods in Cartesian tree matching. Theor. Comput. Sci., 845:181–197, 2020. doi:10.1016/J.TCS.2020.09.014.

[bib.bib27] [27] Siwoo Song, Geonmo Gu, Cheol Ryu, Simone Faro, Thierry Lecroq, and Kunsoo Park. Fast algorithms for single and multiple pattern Cartesian tree matching. Theor. Comput. Sci., 849:47–63, 2021. doi:10.1016/J.TCS.2020.10.009.

[bib.bib28] [28] Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229–239, 1980. doi:10.1145/358841.358852.

Extending the Burrows–Wheeler Transform for Cartesian Tree Matching and Constructing It

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Acknowledgements:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

2 Preliminaries

Strings.

Lemma 2.1 ([14, Lemma 4]).

Alphabet.

Cartesian Tree Matching.

Parent Distance Encoding.

Lemma 2.2 ([26, Theorem 1]).

Problem Statement.

Problem (Count).

3 𝑶⁢(𝒏⁢𝐥𝐠⁡𝝈)-bit Index

3.1 Conjugate Array

Lemma 3.1.

Lemma 3.2 (Weak Periodicity Lemma).

Lemma 3.3 ([13, Lemma 5]).

Proof.

Corollary 3.4.

Corollary 3.5.

3.2 LF-mapping

Lemma 3.6.

Lemma 3.7 ([17, Lemma 3]).

Proof.

Corollary 3.8.

Corollary 3.9.

▶ Remark 3.10.

3.3 Backward Search

Lemma 3.11 ([17, Lemma 6]).

Lemma 3.12.

Lemma 3.13.

Proof.

Lemma 3.14.

Theorem 3.15.

Proof.

4 Construction in 𝑶⁢(𝒏⁢𝐥𝐠⁡𝝈) Bits of Space

4.1 Single Text cBWT Index

Lemma 4.1.

Lemma 4.2.

Proof.

Corollary 4.3.

Proof.

▶ Remark 4.4.

Corollary 4.5.

Proof.

4.2 Extending an Existing cBWT Index

Lemma 4.6.

Proof.

Lemma 4.7.

Proof.

Lemma 4.8.

Proof.

Theorem 4.9.

Proof.

References

3 $O(n\lg\sigma)$ -bit Index

$\blacktriangleright$ Remark 3.10.

4 Construction in $O(n\lg\sigma)$ Bits of Space

$\blacktriangleright$ Remark 4.4.