In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S_1,…,S_k, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1,k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ n^𝒪(1), APSP can be solved in the optimal 𝒪(n+k²) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992]. In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k² usually dominates n, and thus the k² factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries: - One-to-One(i,j): output SPL_{i,j}. - One-to-All(i): output SPL_{i,j} for every j ∈ [1,k]. - Report(i,𝓁): output all distinct j ∈ [1,k] such that SPL_{i,j} ≥ 𝓁, where 𝓁 ≥ 0 is an integer. - Count(i,𝓁): output the number of distinct j ∈ [1,k] such that SPL_{i,j} ≥ 𝓁, where 𝓁 ≥ 0 is an integer. - Top(i,K): output K distinct j ∈ [1,k] with the highest values of SPL_{i,j} breaking ties arbitrarily. We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ n^𝒪(1). We show the following upper bounds: Query | Space (words) | Query time | Note One-to-One(i,j) | 𝒪(n) | 𝒪(log log k) | Theorem 11 One-to-All(i) | 𝒪(n) | 𝒪(k) | Theorem 14 Report(i,𝓁) | 𝒪(n) | 𝒪(log n/log log n+output) | Theorem 19(i) Count(i,𝓁) | 𝒪(n) | 𝒪(log n/log log n) | Theorem 19(ii) Top(i,K) | 𝒪(n) | 𝒪(log² n/log log n+K) | Theorem 22 We also present efficient algorithms for constructing these data structures.
@InProceedings{loukides_et_al:LIPIcs.CPM.2023.21, author = {Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V. and Zuba, Wiktor}, title = {{Suffix-Prefix Queries on a Dictionary}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {21:1--21:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.21}, URN = {urn:nbn:de:0030-drops-179757}, doi = {10.4230/LIPIcs.CPM.2023.21}, annote = {Keywords: all-pairs suffix-prefix, suffix-prefix queries, internal pattern matching} }
Feedback for Dagstuhl Publishing