Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

Authors Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2016.38.pdf
  • Filesize: 0.86 MB
  • 14 pages

Document Identifiers

Author Details

Yuta Fujishige
Yuki Tsujimaru
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda

Cite AsGet BibTex

Yuta Fujishige, Yuki Tsujimaru, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets. In 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, pp. 38:1-38:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.MFCS.2016.38

Abstract

The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets.
Keywords
  • string algorithms
  • DAWGs
  • suffix trees
  • minimal absent words

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Golnaz Badkobeh, Maxime Crochemore, and Chalita Toopsuwan. Computing the maximal-exponent repeats of an overlap-free string in linear time. In SPIRE 2012, pages 61-72, 2012. Google Scholar
  2. Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, and Veli Mäkinen. Versatile succinct representations of the bidirectional burrows-wheeler transform. In Proc. ESA 2013, pages 133-144, 2013. Google Scholar
  3. Anselm Blumer, J. Blumer, David Haussler, Andrzej Ehrenfeucht, M. T. Chen, and Joel I. Seiferas. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci., 40:31-55, 1985. URL: http://dx.doi.org/10.1016/0304-3975(85)90157-4.
  4. Anselm Blumer, J. Blumer, David Haussler, Ross M. McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578-595, 1987. URL: http://dx.doi.org/10.1145/28869.28873.
  5. Maxime Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63-86, 1986. Google Scholar
  6. Maxime Crochemore, Gabriele Fici, Robert Mercas, and Solon P. Pissis. Linear-time sequence comparison using minimal absent words & applications. In LATIN 2016, pages 334-346, 2016. Google Scholar
  7. Maxime Crochemore, Filippo Mignosi, and Antonio Restivo. Automata and forbidden words. Inf. Process. Lett., 67(3):111-117, 1998. URL: http://dx.doi.org/10.1016/S0020-0190(98)00104-5.
  8. Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. Google Scholar
  9. Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa, Giancarlo Mauri, and Giulio Pavesi. On-line construction of compact directed acyclic word graphs. Discrete Applied Mathematics, 146(2):156-179, 2005. Google Scholar
  10. Moritz G. Maaß. Linear bidirectional on-line construction of affix trees. Algorithmica, 37(1):43-74, 2003. Google Scholar
  11. Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  12. Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976. URL: http://dx.doi.org/10.1145/321941.321946.
  13. Filippo Mignosi, Antonio Restivo, and Marinella Sciortino. Words and forbidden factors. Theor. Comput. Sci., 273(1-2):99-117, 2002. Google Scholar
  14. Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Efficient computation of substring equivalence classes with suffix arrays. In CPM 2007, pages 340-351, 2007. Google Scholar
  15. Jens Stoye. Affix trees. Technical Report Report 2000-04, Universität Bielefeld, 2000. URL: https://www.techfak.uni-bielefeld.de/~stoye/dropbox/report00-04.pdf.
  16. Shiho Sugimoto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Finding absent words from grammar compressed strings. In the Festschrift for Bořivoj Melichar, 2012. Google Scholar
  17. Yuka Tanimura, Yuta Fujishige, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. A faster algorithm for computing maximal α-gapped repeats in a string. In SPIRE 2015, pages 124-136, 2015. Google Scholar
  18. Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11, 1973. URL: http://dx.doi.org/10.1109/SWAT.1973.13.
  19. Jun-ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Faster compact on-line Lempel-Ziv factorization. In STACS 2014, pages 675-686, 2014. Google Scholar
  20. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337-343, 1977. Google Scholar