,
Takuya Mieno
Creative Commons Attribution 4.0 International license
The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for preprocessing and a pattern P of length m as a query, the goal is to report all occurrences of P as substrings of T. Navarro and Thankachan [CPM 2015, Theor. Comput. Sci. 2016] introduced a variant of this problem called the gap-bounded consecutive occurrence query, which reports pairs of consecutive occurrences of P in T such that their gaps (i.e., the distances between them) lie within a query-specified range [g₁, g₂]. Recently, Bille et al. [FSTTCS 2020, Theor. Comput. Sci. 2022] proposed the top-k close consecutive occurrence query, which reports the k closest consecutive occurrences of P in T, sorted in non-decreasing order of distance. Both problems are optimally solved in query time with O(n log n)-space data structures.
In this paper, we generalize these problems to the range query model, which focuses only on occurrences of P in a specified substring T[a.. b] of T. Our contributions are as follows:
- We propose an O(n log² n)-space data structure that answers the range top-k consecutive occurrence query in O(|P| + log log n + k) time.
- We propose an O(n log^{2+ε} n)-space data structure that answers the range gap-bounded consecutive occurrence query in O(|P| + log log n + output) time, where ε is a positive constant and output denotes the number of outputs. Additionally, as by-products, we present algorithms for geometric problems involving weighted horizontal segments in a 2D plane, which are of independent interest.
@InProceedings{akram_et_al:LIPIcs.CPM.2025.24,
author = {Akram, Waseem and Mieno, Takuya},
title = {{Sorted Consecutive Occurrence Queries in Substrings}},
booktitle = {36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
pages = {24:1--24:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-369-0},
ISSN = {1868-8969},
year = {2025},
volume = {331},
editor = {Bonizzoni, Paola and M\"{a}kinen, Veli},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.24},
URN = {urn:nbn:de:0030-drops-231187},
doi = {10.4230/LIPIcs.CPM.2025.24},
annote = {Keywords: string algorithm, consecutive occurrences, suffix tree}
}