eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2016-08-19
38:1
38:14
10.4230/LIPIcs.MFCS.2016.38
article
Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets
Fujishige, Yuta
Tsujimaru, Yuki
Inenaga, Shunsuke
Bannai, Hideo
Takeda, Masayuki
The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol058-mfcs2016/LIPIcs.MFCS.2016.38/LIPIcs.MFCS.2016.38.pdf
string algorithms
DAWGs
suffix trees
minimal absent words