,
Paweł Gawrychowski
,
Inge Li Gørtz
,
Simon R. Tarnow
Creative Commons Attribution 4.0 International license
The classical string indexing problem asks to preprocess the input string S for efficient pattern matching queries. Bille, Fischer, Gørtz, Pedersen, and Stordalen [CPM 2023] generalized this to the {streaming sliding window string indexing} problem, where the input string S arrives as a stream, and we are asked to maintain an index of the last w characters, called the window. Further, at any point in time, a pattern P might appear, again given as a stream, and all occurrences of P in the current window must be output. We require that the time to process each character of the text or the pattern is worst-case. It appears that standard string indexing structures, such as suffix trees, do not provide an efficient solution in such a setting, as to obtain a good worst-case bound, they necessarily need to work right-to-left, and we cannot reverse the pattern while keeping a worst-case guarantee on the time to process each of its characters. Nevertheless, it is possible to obtain a bound of 𝒪(log w) (with high probability) by maintaining a hierarchical structure of multiple suffix trees.
We significantly improve this upper bound by designing a black-box reduction to maintain a suffix tree under prepending characters to the current text. By plugging in the known results, this allows us to obtain a bound of 𝒪(log log w +log log σ) (with high probability), where σ is the size of the alphabet. Further, we introduce an even more general problem, called the {streaming dynamic window string indexing}, where the goal is to maintain the current text under adding and deleting characters at either end and design a similar black-box reduction.
@InProceedings{bille_et_al:LIPIcs.CPM.2024.8,
author = {Bille, Philip and Gawrychowski, Pawe{\l} and G{\o}rtz, Inge Li and Tarnow, Simon R.},
title = {{Faster Sliding Window String Indexing in Streams}},
booktitle = {35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
pages = {8:1--8:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-326-3},
ISSN = {1868-8969},
year = {2024},
volume = {296},
editor = {Inenaga, Shunsuke and Puglisi, Simon J.},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.8},
URN = {urn:nbn:de:0030-drops-201183},
doi = {10.4230/LIPIcs.CPM.2024.8},
annote = {Keywords: data structures, pattern matching, string indexing}
}