Charalampopoulos, Panagiotis ;
Kociumaka, Tomasz ;
Mohamed, Manal ;
Radoszewski, Jakub ;
Rytter, Wojciech ;
Walen, Tomasz
Internal Dictionary Matching
Abstract
We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in D is given as a fragment of T. This way, D takes space proportional to the number of patterns d=D rather than their total length, which could be Theta(n * d).
In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D in a fragment T[i..j] (operations Report(i,j) and Count(i,j) below, as well as operation Exists(i,j) that returns true iff Count(i,j)>0) and reporting distinct patterns from D that occur in T[i..j] (operation ReportDistinct(i,j)). We show how to construct, in O((n+d) log^{O(1)} n) time, a data structure that answers each of these queries in time O(log^{O(1)} n+output)  see the table below for specific time and space complexities.
Query  Preprocessing time  Space  Query time
Exists(i,j)  O(n+d)  O(n)  O(1)
Report(i,j)  O(n+d)  O(n+d)  O(1+output)
ReportDistinct(i,j)  O(n log n+d)  O(n+d)  O(log n+output)
Count(i,j)  O({n log n}/{log log n} + d log^{3/2} n)  O(n+d log n)  O({log^2n}/{log log n})
The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight  up to subpolynomial factors  upper and lower bounds for the case of a dynamic dictionary.
BibTeX  Entry
@InProceedings{charalampopoulos_et_al:LIPIcs:2019:11518,
author = {Panagiotis Charalampopoulos and Tomasz Kociumaka and Manal Mohamed and Jakub Radoszewski and Wojciech Rytter and Tomasz Walen},
title = {{Internal Dictionary Matching}},
booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)},
pages = {22:122:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771306},
ISSN = {18688969},
year = {2019},
volume = {149},
editor = {Pinyan Lu and Guochuan Zhang},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2019/11518},
URN = {urn:nbn:de:0030drops115182},
doi = {10.4230/LIPIcs.ISAAC.2019.22},
annote = {Keywords: string algorithms, dictionary matching, internal pattern matching}
}
28.11.2019
Keywords: 

string algorithms, dictionary matching, internal pattern matching 
Seminar: 

30th International Symposium on Algorithms and Computation (ISAAC 2019)

Issue date: 

2019 
Date of publication: 

28.11.2019 