LIPIcs.CPM.2022.27.pdf
- Filesize: 0.97 MB
- 17 pages
A string w is called a minimal absent word for another string T if w does not occur (as a substring) in T and all proper substrings of w occur in T. State-of-the-art data structures for reporting the set MAW(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|MAW(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by a^p where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets ℳ₁, ℳ₂, ℳ₃, ℳ₄, ℳ₅ using RLE, we present matching upper and lower bounds for the number of MAWs in ℳ_i for i = 1,2,4,5 in terms of RLE-size m, except for ℳ₃ whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|MAW(T)|) time.
Feedback for Dagstuhl Publishing