DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures

Authors Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian , Yuto Nakashima , Ryo Yoshinaka , Shunsuke Inenaga , Hideo Bannai , Ayumi Shinohara , Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.26.pdf
  • Filesize: 1.33 MB
  • 14 pages

Document Identifiers

Author Details

Katsuhito Nakashima
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Noriki Fujisato
  • Department of Informatics, Kyushu University, Fukuoka, Japan
Diptarama Hendrian
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Yuto Nakashima
  • Department of Informatics, Kyushu University, Fukuoka, Japan
Ryo Yoshinaka
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Shunsuke Inenaga
  • Department of Informatics, Kyushu University, Fukuoka, Japan
  • PRESTO, Japan Science and Technology Agency, Kawaguchi, Japan
Hideo Bannai
  • M&D Data Science Center, Tokyo Medical and Dental University, Tokyo, Japan
Ayumi Shinohara
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Masayuki Takeda
  • Department of Informatics, Kyushu University, Fukuoka, Japan

Cite AsGet BibTex

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda. DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.26

Abstract

Two strings x and y over Σ ∪ Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ ∪ Π → Σ ∪ Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n²) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n |Π| log (|Π| + |Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • parameterized matching
  • suffix trees
  • DAWGs
  • suffix automata

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In STOC 1993, pages 71-80, 1993. Google Scholar
  2. Brenda S. Baker. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences, 52(1):28-42, 1996. Google Scholar
  3. Richard Beal and Donald A. Adjeroh. p-suffix sorting as arithmetic coding. J. Discrete Algorithms, 16:151-169, 2012. Google Scholar
  4. Anselm Blumer, Janet Blumer, David Haussler, Andrzej Ehrenfeucht, Mu-Tian Chen, and Joel Seiferas. The smallest automation recognizing the subwords of a text. Theoretical computer science, 40:31-55, 1985. Google Scholar
  5. Maxime Crochemore. Transducers and repetitions. Theor. Comput. Sci., 45(1):63-86, 1986. Google Scholar
  6. Satoshi Deguchi, Fumihito Higashijima, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Parameterized suffix arrays for binary strings. In PSC 2008, pages 84-94, 2008. Google Scholar
  7. Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, and Ayumi Shinohara. Position heaps for parameterized strings. In CPM 2017, pages 8:1-8:13, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.8.
  8. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Right-to-left online construction of parameterized position heaps. In PSC 2018, pages 91-102, 2018. Google Scholar
  9. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets. In SPIRE 2019, pages 382-391. Springer International Publishing, 2019. Google Scholar
  10. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. The Parameterized Position Heap of a Trie. In CIAC 2019, pages 237-248. Springer International Publishing, 2019. Google Scholar
  11. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In SODA 2017, pages 397-407, 2017. Google Scholar
  12. Tomohiro I, Satoshi Deguchi, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda. Lightweight parameterized suffix array construction. In IWOCA 2009, pages 312-323, 2009. Google Scholar
  13. S. Rao Kosaraju. Faster algorithms for the construction of parameterized suffix trees (preliminary version). In FOCS 1995, pages 631-637, 1995. Google Scholar
  14. Taehyung Lee, Joong Chae Na, and Kunsoo Park. On-line construction of parameterized suffix trees for large alphabets. Inf. Process. Lett., 111(5):201-207, 2011. Google Scholar
  15. Juan Mendivelso and Yoan Pinzón. Parameterized matching: Solutions and extensions. In Proc. PSC 2015, pages 118-131, 2015. Google Scholar
  16. Juan Mendivelso, Sharma V. Thankachan, and Yoan Pinzón. A brief history of parameterized matching problems. Discrete Applied Mathematics, 2018. Avaliable online. URL: https://doi.org/10.1016/j.dam.2018.07.017.
  17. Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, and Masayuki Takeda. DAWGs for parameterized matching: online construction and related indexing structures. CoRR, abs/2002.06786, 2020. URL: https://arxiv.org/abs/2002.06786.
  18. Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching. Algorithmica, 39(1):1-19, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail