,
Austin Nguyen
,
Reid F. Thompson
Creative Commons Attribution 4.0 International license
Let G = (V, E) be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet Ω, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of G into a weakly connected digraph G' = (V', E') that enables solving the decision problem of whether there is a walk in G matching an arbitrarily long query string q in time linear in |q| and independent of |E| and |V|. We show G is uniquely determined by G' when for every v_𝓁 ∈ V, there is some distinct string s_𝓁 on Ω such that v_𝓁 is the origin of a closed walk in G matching s_𝓁, and no other walk in G matches s_𝓁 unless it starts and ends at v_𝓁. We then exploit this invertibility condition to strategically alter any G so its transform G' enables retrieval of all t terminal vertices of walks in the unaltered G matching q in O(|q| + t log |V|) time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.
@InProceedings{nellore_et_al:LIPIcs.CPM.2021.20,
author = {Nellore, Abhinav and Nguyen, Austin and Thompson, Reid F.},
title = {{An Invertible Transform for Efficient String Matching in Labeled Digraphs}},
booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
pages = {20:1--20:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-186-3},
ISSN = {1868-8969},
year = {2021},
volume = {191},
editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.20},
URN = {urn:nbn:de:0030-drops-139717},
doi = {10.4230/LIPIcs.CPM.2021.20},
annote = {Keywords: pattern matching, string matching, Burrows-Wheeler transform, labeled graphs}
}