LIPIcs.CPM.2024.8.pdf
- Filesize: 1.09 MB
- 14 pages
The classical string indexing problem asks to preprocess the input string S for efficient pattern matching queries. Bille, Fischer, Gørtz, Pedersen, and Stordalen [CPM 2023] generalized this to the {streaming sliding window string indexing} problem, where the input string S arrives as a stream, and we are asked to maintain an index of the last w characters, called the window. Further, at any point in time, a pattern P might appear, again given as a stream, and all occurrences of P in the current window must be output. We require that the time to process each character of the text or the pattern is worst-case. It appears that standard string indexing structures, such as suffix trees, do not provide an efficient solution in such a setting, as to obtain a good worst-case bound, they necessarily need to work right-to-left, and we cannot reverse the pattern while keeping a worst-case guarantee on the time to process each of its characters. Nevertheless, it is possible to obtain a bound of 𝒪(log w) (with high probability) by maintaining a hierarchical structure of multiple suffix trees. We significantly improve this upper bound by designing a black-box reduction to maintain a suffix tree under prepending characters to the current text. By plugging in the known results, this allows us to obtain a bound of 𝒪(log log w +log log σ) (with high probability), where σ is the size of the alphabet. Further, we introduce an even more general problem, called the {streaming dynamic window string indexing}, where the goal is to maintain the current text under adding and deleting characters at either end and design a similar black-box reduction.
Feedback for Dagstuhl Publishing