A Framework for Dynamic Parameterized Dictionary Matching

Authors Arnab Ganguly, Wing-Kai Hon, Rahul Shah



PDF
Thumbnail PDF

File

LIPIcs.SWAT.2016.10.pdf
  • Filesize: 0.49 MB
  • 14 pages

Document Identifiers

Author Details

Arnab Ganguly
Wing-Kai Hon
Rahul Shah

Cite As Get BibTex

Arnab Ganguly, Wing-Kai Hon, and Rahul Shah. A Framework for Dynamic Parameterized Dictionary Matching. In 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 53, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.SWAT.2016.10

Abstract

Two equal-length strings S and S' are a parameterized-match (p-match) iff there exists a one-to-one function that renames the characters in S to those in S'. Let P be a collection of d patterns of total length n characters that are chosen from an alphabet Sigma of cardinality sigma. The task is to index P such that we can support the following operations. 

* search(T): given a text T, report all occurrences <j,P_i> such that there exists a pattern P_i in P that is a p-match with the substring T[j,j+|P_i|-1].

* ins(P_i)/del(P_i): modify the index when a pattern P_i is inserted/deleted.

We present a linear-space index that occupies O(n*log n) bits and supports (i) search(T) in worst-case O(|T|*log^2 n + occ) time, where occ is the number of occurrences reported, and (ii) ins(P_i) and del(P_i) in amortized O(|P_i|*polylog(n)) time.
Then, we present a succinct index that occupies (1+o(1))n*log sigma + O(d*log n) bits and supports (i) search(T) in worst-case O(|T|*log^2 n + occ) time, and (ii) ins(P_i) and del(P_i) in amortized O(|P_i|*polylog(n)) time. We also present results related to the semi-dynamic variant of the problem, where deletion is not allowed.

Subject Classification

Keywords
  • Parameterized Dictionary Indexing
  • Generalized Suffix Tree
  • Succinct Data Structures
  • Sparsification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  2. Stephen Alstrup, Thore Husfeldt, and Theis Rauhe. Marked ancestor problems. In 39th Annual Symposium on Foundations of Computer Science, FOCS'98, November 8-11, 1998, Palo Alto, California, USA, pages 534-544, 1998. URL: http://dx.doi.org/10.1109/SFCS.1998.743504.
  3. Amihood Amir, Martin Farach, Zvi Galil, Raffaele Giancarlo, and Kunsoo Park. Dynamic dictionary matching. J. Comput. Syst. Sci., 49(2):208-222, 1994. URL: http://dx.doi.org/10.1016/S0022-0000(05)80047-9.
  4. Amihood Amir, Martin Farach, Ramana M. Idury, Johannes A. La Poutré, and Alejandro A. Schäffer. Improved dynamic dictionary matching. Inf. Comput., 119(2):258-282, 1995. URL: http://dx.doi.org/10.1006/inco.1995.1090.
  5. Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA, pages 71-80, 1993. URL: http://dx.doi.org/10.1145/167088.167115.
  6. Djamal Belazzougui. Succinct dictionary matching with no slowdown. In Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, pages 88-100, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_9.
  7. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, pages 88-94, 2000. URL: http://dx.doi.org/10.1007/10719839_9.
  8. Sudip Biswas, Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Forbidden extension queries. In 35th IARCS Annual Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2015, December 16-18, 2015, Bangalore, India, pages 320-335, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2015.320.
  9. Ho-Leung Chan, Wing-Kai Hon, Tak Wah Lam, and Kunihiko Sadakane. Dynamic dictionary matching and compressed suffix trees. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, Vancouver, British Columbia, Canada, January 23-25, 2005, pages 13-22, 2005. URL: http://dl.acm.org/citation.cfm?id=1070432.1070436.
  10. Paul F. Dietz and Daniel Dominic Sleator. Two algorithms for maintaining order in a list. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, New York, New York, USA, pages 365-372, 1987. URL: http://dx.doi.org/10.1145/28395.28434.
  11. Martin Dietzfelbinger, Anna R. Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, and Robert Endre Tarjan. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738-761, 1994. URL: http://dx.doi.org/10.1137/S0097539791194094.
  12. Guy Feigenblat, Ely Porat, and Ariel Shiftan. An improved query time for succinct dynamic dictionary matching. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, pages 120-129, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07566-2_13.
  13. Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12-14 November 2000, Redondo Beach, California, USA, pages 390-398, 2000. URL: http://dx.doi.org/10.1109/SFCS.2000.892127.
  14. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Succinct non-overlapping indexing. In Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings, pages 185-195, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_16.
  15. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 841-850, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250.
  16. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA, pages 397-406, 2000. URL: http://dx.doi.org/10.1145/335305.335351.
  17. Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, and Jeffrey Scott Vitter. Compressed index for dictionary matching. In 2008 Data Compression Conference (DCC 2008), 25-27 March 2008, Snowbird, UT, USA, pages 23-32, 2008. URL: http://dx.doi.org/10.1109/DCC.2008.62.
  18. Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, and Jeffrey Scott Vitter. Succinct index for dynamic dictionary matching. In Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009. Proceedings, pages 1034-1043, 2009. URL: http://dx.doi.org/10.1007/978-3-642-10631-6_104.
  19. Ramana M. Idury and Alejandro A. Schäffer. Multiple matching of parameterized patterns. In Combinatorial Pattern Matching, 5th Annual Symposium, CPM 94, Asilomar, California, USA, June 5-8, 1994, Proceedings, pages 226-239, 1994. URL: http://dx.doi.org/10.1007/3-540-58094-8_20.
  20. Tsvi Kopelowitz and Moshe Lewenstein. Dynamic weighted ancestors. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 565-574, 2007. URL: http://dl.acm.org/citation.cfm?id=1283383.1283444.
  21. S. Rao Kosaraju. Faster algorithms for the construction of parameterized suffix trees (preliminary version). In 36th Annual Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, 23-25 October 1995, pages 631-637, 1995. URL: http://dx.doi.org/10.1109/SFCS.1995.492664.
  22. Moshe Lewenstein. Parameterized pattern matching. In Encyclopedia of Algorithms, 2015. URL: http://dx.doi.org/10.1007/978-3-642-27848-8_282-2.
  23. Veli Mäkinen and Gonzalo Navarro. Dynamic entropy-compressed sequences and full-text indexes. ACM Transactions on Algorithms, 4(3), 2008. URL: http://dx.doi.org/10.1145/1367064.1367072.
  24. Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976. URL: http://dx.doi.org/10.1145/321941.321946.
  25. J. Ian Munro, Gonzalo Navarro, Jesper Sindahl Nielsen, Rahul Shah, and Sharma V. Thankachan. Top- k term-proximity in succinct space. In Algorithms and Computation - 25th International Symposium, ISAAC 2014, Jeonju, Korea, December 15-17, 2014, Proceedings, pages 169-180, 2014. URL: http://dx.doi.org/10.1007/978-3-319-13075-0_14.
  26. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1), 2007. URL: http://dx.doi.org/10.1145/1216370.1216372.
  27. Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1-16:39, 2014. URL: http://dx.doi.org/10.1145/2601073.
  28. Mark H. Overmars. The Design of Dynamic Data Structures, volume 156 of Lecture Notes in Computer Science. Springer, 1983. URL: http://dx.doi.org/10.1007/BFb0014927.
  29. Kunihiko Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Algorithms and Computation, 11th International Conference, ISAAC 2000, Taipei, Taiwan, December 18-20, 2000, Proceedings, pages 410-421, 2000. URL: http://dx.doi.org/10.1007/3-540-40996-3_35.
  30. Dekel Tsur. Top-k document retrieval in optimal space. Inf. Process. Lett., 113(12):440-443, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.03.012.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail