Faster External Memory LCP Array Construction

Authors Juha Kärkkäinen, Dominik Kempa



PDF
Thumbnail PDF

File

LIPIcs.ESA.2016.61.pdf
  • Filesize: 0.66 MB
  • 16 pages

Document Identifiers

Author Details

Juha Kärkkäinen
Dominik Kempa

Cite AsGet BibTex

Juha Kärkkäinen and Dominik Kempa. Faster External Memory LCP Array Construction. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 61:1-61:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ESA.2016.61

Abstract

The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demonstrate that the new algorithms are about a factor of two faster than the fastest previous algorithm.
Keywords
  • LCP array
  • suffix array
  • external memory algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms, 2(1):53-86, 2004. URL: http://dx.doi.org/10.1016/S1570-8667(03)00065-0.
  2. M. J. Bauer, A. J. Cox, G. Rosone, and M. Sciortino. Lightweight LCP construction for next-generation sequencing datasets. In Proceedings of the 12th Workshop on Algorithms in Bioinformatics (WABI 2012), volume 7534 of LNCS, pages 326-337. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-33122-0_26.
  3. T. Beller, S. Gog, E. Ohlebusch, and T. Schnattinger. Computing the longest common prefix array based on the Burrows-Wheeler transform. J. Discrete Algorithms, 18:22-31, 2013. URL: http://dx.doi.org/10.1016/j.jda.2012.07.007.
  4. T. Bingmann, J. Fischer, and V. Osipov. Inducing suffix and LCP arrays in external memory. In Proceedings of the 2013 Workshop on Algorithm Engineering and Experiments (ALENEX 2013), pages 88-102. SIAM, 2013. URL: http://dx.doi.org/10.1137/1.9781611972931.8.
  5. D. R. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1998. Google Scholar
  6. F. A. da Louza, G. P. Telles, and C. D. de Aguiar Ciferri. External memory generalized suffix and LCP arrays construction. In Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), volume 7922 of LNCS, pages 201-210. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_20.
  7. R. Dementiev, J. Kärkkäinen, J. Mehnert, and P. Sanders. Better external memory suffix array construction. ACM J. Exp. Algor., 12:3.4:1-3.4:24, August 2008. URL: http://dx.doi.org/10.1145/1227161.1402296.
  8. M. Deo and S. Keely. Parallel suffix array and least common prefix for the GPU. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pages 197-206. ACM, 2013. URL: http://dx.doi.org/10.1145/2442516.2442536.
  9. S. Gog and E. Ohlebusch. Fast and lightweight LCP-array construction algorithms. In Proceedings of the 2011 Workshop on Algorithm Engineering and Experiments (ALENEX 2011), pages 25-34. SIAM, 2011. URL: http://dx.doi.org/10.1137/1.9781611972917.3.
  10. G. H. Gonnet, R. A. Baeza-Yates, and T. Snider. New indices for text: Pat trees and Pat arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 66-82. Prentice-Hall, 1992. Google Scholar
  11. J. Kärkkäinen and D. Kempa. Engineering a lightweight external memory suffix array construction algorithm. In Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD 2014), volume 1146 of CEUR Workshop Proceedings, pages 53-60. CEUR-WS.org, 2014. Google Scholar
  12. J. Kärkkäinen and D. Kempa. LCP array construction in external memory. In Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), volume 8504 of LNCS, pages 412-423. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_35.
  13. J. Kärkkäinen and D. Kempa. LCP array construction in external memory. J. Exp. Algorithmics, 21(1):1.7:1-1.7:22, April 2016. URL: http://dx.doi.org/10.1145/2851491.
  14. J. Kärkkäinen, D. Kempa, and M. Pia̧tkowski. Tighter bounds for the sum of irreducible LCP values. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 316-328. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_27.
  15. J. Kärkkäinen, D. Kempa, and S. J. Puglisi. Parallel external memory suffix sorting. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 329-342. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_28.
  16. J. Kärkkäinen, G. Manzini, and S. J. Puglisi. Permuted longest-common-prefix array. In Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM 2009), volume 5577 of LNCS, pages 181-192. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-642-02441-2_17.
  17. J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP 2003), volume 2719 of LNCS, pages 943-955. Springer, 2003. URL: http://dx.doi.org/10.1007/3-540-45061-0_73.
  18. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), volume 2089 of LNCS, pages 181-192. Springer, 2001. URL: http://dx.doi.org/10.1007/3-540-48194-X_17.
  19. W. Liu, G. Nong, W. H. Chan, and Y. Wu. Induced sorting suffixes in external memory with better design and less space. In Proceedings of the 22nd International Symposium on String Processing and Information Retrieval (SPIRE 2015), volume 9309 of LNCS, pages 83-94. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-23826-5_9.
  20. V. Mäkinen. Compact suffix array - a space efficient full-text index. Fund. Inform., 56(1-2):191-210, 2003. Google Scholar
  21. V. Mäkinen, D. Belazzougui, F. Cunial, and A. I. Tomescu. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press, 2015. Google Scholar
  22. U. Manber and G. W. Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: http://dx.doi.org/10.1137/0222058.
  23. G. Manzini. Two space saving tricks for linear time LCP array computation. In Proceedings of the 14th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2004), volume 3111 of LNCS, pages 372-383. Springer, 2004. URL: http://dx.doi.org/10.1007/978-3-540-27810-8_32.
  24. I. Munro. Tables. In Proceedings of the 16th Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 1996), volume 1180 of LNCS, pages 37-42. Springer, 1996. URL: http://dx.doi.org/10.1007/3-540-62034-6_35.
  25. G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1):article 2, 2007. URL: http://dx.doi.org/10.1145/1216370.1216372.
  26. G. Nong, W. H. Chan, S. Q. Hu, and Y. Wu. Induced sorting suffixes in external memory. ACM Trans. Inf. Syst., 33(3), February 2015. URL: http://dx.doi.org/10.1145/2699665.
  27. G. Nong, W. H. Chan, S. Zhang, and X. F. Guan. Suffix array construction in external memory using d-critical substrings. ACM Trans. Inf. Syst., 32(1), January 2014. URL: http://dx.doi.org/10.1145/2518175.
  28. E. Ohlebusch. Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag, 2013. Google Scholar
  29. D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In Proceedings of the 2007 Workshop on Algorithm Engineering and Experiments (ALENEX 2007). SIAM, 2007. URL: http://dx.doi.org/10.1137/1.9781611972870.6.
  30. S. J. Puglisi and A. Turpin. Space-time tradeoffs for longest-common-prefix array computation. In Proceedings of the 19th International Symposium on Algorithms and Computation (ISAAC 2008), volume 5369 of LNCS, pages 124-135. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-92182-0_14.
  31. K. Sadakane. Succinct representations of lcp information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), pages 225-232. ACM/SIAM, 2002. Google Scholar
  32. J. Shun. Fast parallel computation of longest common prefixes. In Proceedings of the 2014 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2014), pages 387-398. IEEE, 2014. URL: http://dx.doi.org/10.1109/SC.2014.37.
  33. J. Sirén. Sampled longest common prefix array. In Proceedings of the 21st Annual Symposium on Combinatorial Pattern Matching (CPM 2010), volume 6129 of LNCS, pages 227-237. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_21.
  34. J. S. Vitter. Algorithms and data structures for external memory. Found. Trends Theoretical Computer Science, 2(4):305-474, 2006. URL: http://dx.doi.org/10.1561/0400000014.