Succinct Online Dictionary Matching with Improved Worst-Case Guarantees
In the online dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label.
In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the online dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output.
Moreover, the amortized cost per character is O(1+occ) time.
Succinct indexing
dictionary matching
Aho-Corasick
labeled trees
6:1-6:13
Regular Paper
Tsvi
Kopelowitz
Tsvi Kopelowitz
Ely
Porat
Ely Porat
Yaron
Rozen
Yaron Rozen
10.4230/LIPIcs.CPM.2016.6
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode