,
Diptarama Hendrian
,
Dominik Köppl
,
Ryo Yoshinaka
,
Ayumi Shinohara
Creative Commons Attribution 4.0 International license
A parameterized string (p-string) is a string over an alphabet (Σ_s ∪ Σ_p), where Σ_s and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings x and y are said to parameterized match (p-match) if and only if x can be transformed into y by applying a bijection on Σ_p to every occurrence of p-symbols in x. The indexing problem for p-matching is to preprocess a p-string T of length n so that we can efficiently find the occurrences of substrings of T that p-match with a given pattern. Let σ_s and respectively σ_p be the numbers of distinct s-symbols and p-symbols that appear in T and σ = σ_s + σ_p. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in O(n lg |Σ_s ∪ Σ_p|) bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for T, under the assumption that Σ_s ∪ Σ_p = [0..O(σ)], in O(n lg σ) bits of space and O(n (σ_p lg n)/(lg lg n)) time in an online manner while reading the symbols of T from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in O(n lg σ) bits of space and O(n (lg σ_p lg n)/(lg lg n)) time in a more general assumption that Σ_s ∪ Σ_p = [0..n^{O(1)}]. Our result has an immediate application to constructing parameterized suffix arrays in O(n (lg σ_p lg n)/(lg lg n)) time and O(n lg σ) bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.
@InProceedings{iseri_et_al:LIPIcs.ICALP.2024.89,
author = {Iseri, Kento and I, Tomohiro and Hendrian, Diptarama and K\"{o}ppl, Dominik and Yoshinaka, Ryo and Shinohara, Ayumi},
title = {{Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching}},
booktitle = {51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
pages = {89:1--89:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-322-5},
ISSN = {1868-8969},
year = {2024},
volume = {297},
editor = {Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.89},
URN = {urn:nbn:de:0030-drops-202324},
doi = {10.4230/LIPIcs.ICALP.2024.89},
annote = {Keywords: Index for parameterized pattern matching, Parameterized Burrows-Wheeler Transform, Online construction}
}