Deterministic Indexing for Packed Strings

Authors Philip Bille, Inge Li Gørtz, Frederik Rye Skjoldjensen



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.6.pdf
  • Filesize: 0.54 MB
  • 11 pages

Document Identifiers

Author Details

Philip Bille
Inge Li Gørtz
Frederik Rye Skjoldjensen

Cite AsGet BibTex

Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic Indexing for Packed Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 6:1-6:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.CPM.2017.6

Abstract

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In the deterministic variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the packed variant the strings are stored with several character in a single word, giving us the opportunity to read multiple characters simultaneously. Our main result is a new string index in the deterministic and packed setting. Given a packed string S of length n over an alphabet s, we show how to preprocess S in O(n) (deterministic) time and space O(n) such that given a packed pattern string of length m we can support queries in (deterministic) time O(m/a + log m + log log s), where a = w /log s is the number of characters packed in a word of size w = log n. Our query time is always at least as good as the previous best known bounds and whenever several characters are packed in a word, i.e., log s << w, the query times are faster.
Keywords
  • suffix tree
  • suffix array
  • deterministic algorithm
  • word packing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Djamal Belazzougui. Worst-case efficient single and multiple string matching on packed texts in the word-RAM model. J. Discrete Algorithms, 14:91-106, 2012. URL: http://dx.doi.org/10.1016/j.jda.2011.12.011.
  2. Oren Ben-Kiki, Philip Bille, Dany Breslauer, Leszek Gasieniec, Roberto Grossi, and Oren Weimann. Towards optimal packed string matching. Theor. Comput. Sci., 525:111-129, 2014. URL: http://dx.doi.org/10.1016/j.tcs.2013.06.013.
  3. Philip Bille. Fast searching in packed strings. J. Discrete Algorithms, 9(1):49-56, 2011. URL: http://dx.doi.org/10.1016/j.jda.2010.09.003.
  4. Richard Cole, Tsvi Kopelowitz, and Moshe Lewenstein. Suffix trays and suffix trists: structures for faster text indexing. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP 2006), volume 4051 of LNCS, pages 358-369. Springer, 2006. URL: http://dx.doi.org/10.1007/11786986_32.
  5. Martin Farach-Colton, Paolo Ferragina, and Shanmugavelayutham Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. URL: http://dx.doi.org/10.1145/355541.355547.
  6. Johannes Fischer and Paweł Gawrychowski. Alphabet-dependent string searching with wexponential search trees. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 160-171. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_14.
  7. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
  8. Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. URL: http://dx.doi.org/10.1016/0022-0000(93)90040-4.
  9. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/CBO9780511574931.
  10. Udi Manber and Gene Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: http://dx.doi.org/10.1137/0222058.
  11. Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262-272, 1976. URL: http://dx.doi.org/10.1145/321941.321946.
  12. Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz, editors, Proceedings of the 35th International Colloquium on Automata, Languages, and Programming (ICALP 2008), volume 5125 of LNCS, pages 84-95. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-70575-8_8.
  13. Takuya Takagi, Shunsuke Inenaga, Kunihiko Sadakane, and Hiroki Arimura. Packed compact tries: A fast and efficient data structure for online string processing. In Veli Mäkinen, Simon J. Puglisi, and Leena Salmela, editors, Proceedings of the 27th International Workshop on Combinatorial Algorithms (IWOCA 2016), volume 9843 of LNCS, pages 213-225. Springer, Springer, 2016. URL: http://dx.doi.org/10.1007/978-3-319-44543-4_17.
  14. Peter Weiner. Linear pattern matching algorithms. In H. Raymond Strong, editor, Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), pages 1-11. IEEE Computer Society, 1973. URL: http://dx.doi.org/10.1109/SWAT.1973.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail