Structural Pattern Matching - Succinctly

Authors Arnab Ganguly, Rahul Shah, Sharma V. Thankachan



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2017.35.pdf
  • Filesize: 0.62 MB
  • 13 pages

Document Identifiers

Author Details

Arnab Ganguly
Rahul Shah
Sharma V. Thankachan

Cite As Get BibTex

Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural Pattern Matching - Succinctly. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 35:1-35:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ISAAC.2017.35

Abstract

Let T be a text of length n containing characters from an alphabet \Sigma, which is the union of two disjoint sets:  \Sigma_s containing static characters (s-characters) and \Sigma_p containing parameterized characters (p-characters). 
Each character in \Sigma_p has an associated complementary character from \Sigma_p.
A pattern P (also over \Sigma) matches an equal-length substring $S$ of T iff the s-characters match exactly, there exists a one-to-one function that renames the p-characters in S to the p-characters in P, and if a p-character x is renamed to another p-character y then the complement of x is renamed to the complement of y. 
The task is to find the starting positions (occurrences) of all such substrings S.
Previous indexing solution [Shibuya, SWAT 2000], known as Structural  Suffix Tree, requires \Theta(n\log n) bits of space, and can find all occ occurrences in time O(|P|\log \sigma+ occ), where \sigma = |\Sigma|.
In this paper, we present the first succinct index for this problem, which occupies n \log \sigma + O(n) bits and offers O(|P|\log\sigma+ occ\cdot  \log n \log\sigma) query time.

Subject Classification

Keywords
  • Parameterized Pattern Matching
  • Suffix tree
  • Burrows-Wheeler Transform
  • Wavelet Tree
  • Fully-functional succinct tree

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In S. Rao Kosaraju, David S. Johnson, and Alok Aggarwal, editors, Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA, pages 71-80. ACM, 1993. URL: http://dx.doi.org/10.1145/167088.167115.
  2. Djamal Belazzougui and Gonzalo Navarro. Alphabet-independent compressed text indexing. ACM Trans. Algorithms, 10(4):23:1-23:19, 2014. URL: http://dx.doi.org/10.1145/2635816.
  3. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation (now part of Hewlett-Packard, Palo Alto, CA), 1994. Google Scholar
  4. Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffix links. SIAM J. Comput., 33(1):26-42, 2003. URL: http://dx.doi.org/10.1137/S0097539701424465.
  5. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Order-preserving incomplete suffix trees and order-preserving indexes. In Oren Kurland, Moshe Lewenstein, and Ely Porat, editors, String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013, Proceedings, volume 8214 of Lecture Notes in Computer Science, pages 84-95. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-319-02432-5_13.
  6. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. URL: http://dx.doi.org/10.1145/1082036.1082039.
  7. Johannes Fischer. Wee LCP. Inf. Process. Lett., 110(8-9):317-320, 2010. URL: http://dx.doi.org/10.1016/j.ipl.2010.02.010.
  8. Johannes Fischer and Volker Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. In Bo Chen, Mike Paterson, and Guochuan Zhang, editors, Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Revised Selected Papers, volume 4614 of Lecture Notes in Computer Science, pages 459-470. Springer, 2007. URL: http://dx.doi.org/10.1007/978-3-540-74450-4_41.
  9. Arnab Ganguly, Rahul Shah, and Sharma V Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 397-407. Society for Industrial and Applied Mathematics, 2017. Google Scholar
  10. Raffaele Giancarlo. The suffix of a square matrix, with applications. In Vijaya Ramachandran, editor, Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas., pages 402-411. ACM/SIAM, 1993. Google Scholar
  11. Alexander Golynski, Roberto Grossi, Ankur Gupta, Rajeev Raman, and S. Srinivasa Rao. On the size of succinct indices. In Lars Arge, Michael Hoffmann, and Emo Welzl, editors, Algorithms - ESA 2007, 15th Annual European Symposium, Eilat, Israel, October 8-10, 2007, Proceedings, volume 4698 of Lecture Notes in Computer Science, pages 371-382. Springer, 2007. URL: http://dx.doi.org/10.1007/978-3-540-75520-3_34.
  12. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 841-850. ACM/SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250.
  13. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005. URL: http://dx.doi.org/10.1137/S0097539702402354.
  14. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. Google Scholar
  15. Dong Kyue Kim, Yoo Ah Kim, and Kunsoo Park. Generalizations of suffix arrays to multi-dimensional matrices. Theor. Comput. Sci., 302(1-3):223-238, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00828-9.
  16. Gonzalo Navarro. Spaces, trees, and colors: The algorithmic landscape of document retrieval on sequences. ACM Comput. Surv., 46(4):52:1-52:47, 2013. URL: http://dx.doi.org/10.1145/2535933.
  17. Gonzalo Navarro. Wavelet trees for all. J. Discrete Algorithms, 25:2-20, 2014. URL: http://dx.doi.org/10.1016/j.jda.2013.07.004.
  18. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1), 2007. URL: http://dx.doi.org/10.1145/1216370.1216372.
  19. Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1-16:39, 2014. URL: http://dx.doi.org/10.1145/2601073.
  20. Luís M. S. Russo, Gonzalo Navarro, and Arlindo L. Oliveira. Fully compressed suffix trees. ACM Transactions on Algorithms, 7(4):53, 2011. URL: http://dx.doi.org/10.1145/2000807.2000821.
  21. Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching. In Magnús M. Halldórsson, editor, Algorithm Theory - SWAT 2000, 7th Scandinavian Workshop on Algorithm Theory, Bergen, Norway, July 5-7, 2000, Proceedings, volume 1851 of Lecture Notes in Computer Science, pages 393-406. Springer, 2000. URL: http://dx.doi.org/10.1007/3-540-44985-X_34.
  22. Tetsuo Shibuya. Geometric suffix tree: Indexing protein 3-d structures. J. ACM, 57(3):15:1-15:17, 2010. URL: http://dx.doi.org/10.1145/1706591.1706595.
  23. Dekel Tsur. Top-k document retrieval in optimal space. Inf. Process. Lett., 113(12):440-443, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.03.012.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail