We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in D is given as a fragment of T. This way, D takes space proportional to the number of patterns d=|D| rather than their total length, which could be Theta(n * d). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D in a fragment T[i..j] (operations Report(i,j) and Count(i,j) below, as well as operation Exists(i,j) that returns true iff Count(i,j)>0) and reporting distinct patterns from D that occur in T[i..j] (operation ReportDistinct(i,j)). We show how to construct, in O((n+d) log^{O(1)} n) time, a data structure that answers each of these queries in time O(log^{O(1)} n+|output|) - see the table below for specific time and space complexities. Query | Preprocessing time | Space | Query time Exists(i,j) | O(n+d) | O(n) | O(1) Report(i,j) | O(n+d) | O(n+d) | O(1+|output|) ReportDistinct(i,j) | O(n log n+d) | O(n+d) | O(log n+|output|) Count(i,j) | O({n log n}/{log log n} + d log^{3/2} n) | O(n+d log n) | O({log^2n}/{log log n}) The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight - up to subpolynomial factors - upper and lower bounds for the case of a dynamic dictionary.
@InProceedings{charalampopoulos_et_al:LIPIcs.ISAAC.2019.22, author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz}, title = {{Internal Dictionary Matching}}, booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)}, pages = {22:1--22:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-130-6}, ISSN = {1868-8969}, year = {2019}, volume = {149}, editor = {Lu, Pinyan and Zhang, Guochuan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.22}, URN = {urn:nbn:de:0030-drops-115182}, doi = {10.4230/LIPIcs.ISAAC.2019.22}, annote = {Keywords: string algorithms, dictionary matching, internal pattern matching} }
Feedback for Dagstuhl Publishing