On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Fischer, Johannes; Köppl, Dominik; Kurpicz, Florian

doi:10.4230/LIPIcs.CPM.2016.26

File

LIPIcs.CPM.2016.26.pdf

Filesize: 0.57 MB
11 pages

Document Identifiers

DOI: 10.4230/LIPIcs.CPM.2016.26
URN: urn:nbn:de:0030-drops-60669

Author Details

Johannes Fischer

Dominik Köppl

Florian Kurpicz

Cite AsGet BibTex

Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 26:1-26:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.CPM.2016.26

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).

Keywords

parallel algorithms
pattern matching
approximate string matching

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

D. Belazzougui and G. Navarro. Alphabet-independent compressed text indexing. ACM Transactions on Algorithms (TALG), 10(4):article 23, 2014.
Dany Breslauer and Zvi Galil. An optimal O(log log n) time parallel string matching algorithm. SIAM J. Comput., 19(6):1051-1058, 1990.
Dany Breslauer and Zvi Galil. A lower bound for parallel string matching. SIAM J. Comput., 21(5):856-862, 1992.
Martin Farach and S. Muthukrishnan. Optimal logarithmic time randomized suffix tree construction. In Proc. ICALP, volume 1099 of LNCS, pages 550-561. Springer, 1996.
Johannes Fischer. Wee LCP. Inform. Process. Lett., 110(8-9):317-320, 2010.
Johannes Fischer. Combined data structure for previous- and next-smaller-values. Theor. Comput. Sci., 412(22):2451-2456, 2011.
Johannes Fischer, Veli Mäkinen, and Gonzalo Navarro. Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci., 410(51):5354-5364, 2009.
Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. J. ACM, 31(3):538-544, 1984.
Simon Gog and Johannes Fischer. Advantages of shared data structures for sequences of balanced parentheses. In Proc. DCC, pages 406-415. IEEE Press, 2010.
Trinh ND Huynh, Wing-Kai Hon, Tak-Wah Lam, and Wing-Kin Sung. Approximate string matching using compressed suffix arrays. Theoretical Computer Science, 352(1):240-249, 2006.
Matevz Jekovec and Andrej Brodnik. Parallel query in the suffix tree. CoRR, abs/1509.06167, 2015. URL: http://arxiv.org/abs/1509.06167.
Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Parallel external memory suffix sorting. In Proc. CPM, volume 9133 of LNCS, pages 329-342. Springer, 2015.
Tak-Wah Lam, Wing-Kin Sung, and Swee-Seong Wong. Improved Approximate String Matching Using Compressed Suffix Data Structures. Algorithmica, 51(3):298–314, 2007. URL: http://dx.doi.org/10.1007/s00453-007-9104-8.
Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993.
J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. Succinct representations of permutations and functions. Theor. Comput. Sci., 438:74-88, 2012.
Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1):Article No. 2, 2007.
Enno Ohlebusch, Johannes Fischer, and Simon Gog. CST++. In Proc. SPIRE, volume 6393 of LNCS, pages 322-333. Springer, 2010.
Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Proc. ICALP (1), volume 5125 of LNCS, pages 84-95. Springer, 2008.
Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst, 41(4):589-607, 2007.
Kunihiko Sadakane and Gonzalo Navarro. Fully-functional succinct trees. In Proc. SODA, pages 134-149. ACM/SIAM, 2010.
Marc Snir. On parallel searching. SIAM J. Comput., 14(3):688-708, 1985. URL: http://dx.doi.org/10.1137/0214051.
Esko Ukkonen. Approximate string-matching over suffix trees. In Proc. CPM, volume 684 of LNCS, pages 228-242. Springer, 1993.
Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(n). Inform. Process. Lett., 17(2):81-84, 1983.