On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Authors Johannes Fischer, Dominik Köppl, Florian Kurpicz



PDF
Thumbnail PDF

File

LIPIcs.CPM.2016.26.pdf
  • Filesize: 0.57 MB
  • 11 pages

Document Identifiers

Author Details

Johannes Fischer
Dominik Köppl
Florian Kurpicz

Cite As Get BibTex

Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 26:1-26:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.CPM.2016.26

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).

Subject Classification

Keywords
  • parallel algorithms
  • pattern matching
  • approximate string matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. D. Belazzougui and G. Navarro. Alphabet-independent compressed text indexing. ACM Transactions on Algorithms (TALG), 10(4):article 23, 2014. Google Scholar
  2. Dany Breslauer and Zvi Galil. An optimal O(log log n) time parallel string matching algorithm. SIAM J. Comput., 19(6):1051-1058, 1990. Google Scholar
  3. Dany Breslauer and Zvi Galil. A lower bound for parallel string matching. SIAM J. Comput., 21(5):856-862, 1992. Google Scholar
  4. Martin Farach and S. Muthukrishnan. Optimal logarithmic time randomized suffix tree construction. In Proc. ICALP, volume 1099 of LNCS, pages 550-561. Springer, 1996. Google Scholar
  5. Johannes Fischer. Wee LCP. Inform. Process. Lett., 110(8-9):317-320, 2010. Google Scholar
  6. Johannes Fischer. Combined data structure for previous- and next-smaller-values. Theor. Comput. Sci., 412(22):2451-2456, 2011. Google Scholar
  7. Johannes Fischer, Veli Mäkinen, and Gonzalo Navarro. Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci., 410(51):5354-5364, 2009. Google Scholar
  8. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984. Google Scholar
  9. Simon Gog and Johannes Fischer. Advantages of shared data structures for sequences of balanced parentheses. In Proc. DCC, pages 406-415. IEEE Press, 2010. Google Scholar
  10. Trinh ND Huynh, Wing-Kai Hon, Tak-Wah Lam, and Wing-Kin Sung. Approximate string matching using compressed suffix arrays. Theoretical Computer Science, 352(1):240-249, 2006. Google Scholar
  11. Matevz Jekovec and Andrej Brodnik. Parallel query in the suffix tree. CoRR, abs/1509.06167, 2015. URL: http://arxiv.org/abs/1509.06167.
  12. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Parallel external memory suffix sorting. In Proc. CPM, volume 9133 of LNCS, pages 329-342. Springer, 2015. Google Scholar
  13. Tak-Wah Lam, Wing-Kin Sung, and Swee-Seong Wong. Improved Approximate String Matching Using Compressed Suffix Data Structures. Algorithmica, 51(3):298–314, 2007. URL: http://dx.doi.org/10.1007/s00453-007-9104-8.
  14. Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. Google Scholar
  15. J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct representations of permutations and functions. Theor. Comput. Sci., 438:74-88, 2012. Google Scholar
  16. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1):Article No. 2, 2007. Google Scholar
  17. Enno Ohlebusch, Johannes Fischer, and Simon Gog. CST++. In Proc. SPIRE, volume 6393 of LNCS, pages 322-333. Springer, 2010. Google Scholar
  18. Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Proc. ICALP (1), volume 5125 of LNCS, pages 84-95. Springer, 2008. Google Scholar
  19. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst, 41(4):589-607, 2007. Google Scholar
  20. Kunihiko Sadakane and Gonzalo Navarro. Fully-functional succinct trees. In Proc. SODA, pages 134-149. ACM/SIAM, 2010. Google Scholar
  21. Marc Snir. On parallel searching. SIAM J. Comput., 14(3):688-708, 1985. URL: http://dx.doi.org/10.1137/0214051.
  22. Esko Ukkonen. Approximate string-matching over suffix trees. In Proc. CPM, volume 684 of LNCS, pages 228-242. Springer, 1993. Google Scholar
  23. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(n). Inform. Process. Lett., 17(2):81-84, 1983. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail