Update Query Time Trade-Off for Dynamic Suffix Arrays

Authors Amihood Amir, Itai Boneh



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2020.63.pdf
  • Filesize: 0.51 MB
  • 16 pages

Document Identifiers

Author Details

Amihood Amir
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Itai Boneh
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel

Cite AsGet BibTex

Amihood Amir and Itai Boneh. Update Query Time Trade-Off for Dynamic Suffix Arrays. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ISAAC.2020.63

Abstract

The Suffix Array SA(S) of a string S[1 … n] is an array containing all the suffixes of S sorted by lexicographic order. The suffix array is one of the most well known indexing data structures, and it functions as a key tool in many string algorithms. In this paper, we present a data structure for maintaining the Suffix Array of a dynamic string. For every 1 ≤ k ≤ n, our data structure reports SA[i] in 𝒪̃(n/k) time and handles text modification in 𝒪̃(k) time. Additionally, our data structure enables the same query time for reporting iSA[i], with iSA being the Inverse Suffix Array of S[1 … n]. Our data structure can be used to construct sub-linear dynamic variants of static strings algorithms or data structures that are based on the Suffix Array and the Inverse Suffix Array.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Sorting and searching
Keywords
  • String Algorithms
  • Dynamic Algorithms
  • Suffix Array
  • Inverse Suffix Array

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. D. Adjeroh, T. Bell, and A. Mukherjee. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer-Verlag, 2008. ISBN 00387789081. Google Scholar
  2. A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in z-compressed files. Journal of Computer and System Sciences, 52(2):299-307, 1996. Google Scholar
  3. A. Amir, I. Boneh, P. Charalampopoulos, and E. Kondratovsky. Repetition detection in a dynamic string. In Proc. 27th Annual European Symposium on Algorithms (ESA), LIPIcs, pages 5:1-5:18, September 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.5.
  4. A. Amir, P. Charalampopoulos, C.S. Iliopoulos, S.P. Pissis, and J. Radoszewski. Longest common factor after one edit operation. In Proc. 24th International Symposium on String Processing and Information Retrieval (SPIRE), LNCS, pages 14-26. Springer, 2017. best paper award. Google Scholar
  5. A. Amir, P. Charalampopoulos, S. P. Pissis, and J. Radoszewski. Longest common factor made fully dynamic. Technical Report abs/1804.08731, CoRR, April 2018. URL: http://arxiv.org/abs/1804.08731.
  6. A. Amir and M. Farach. Adaptive dictionary matching. Proc. 32nd IEEE FOCS, pages 760-766, 1991. Google Scholar
  7. A. Amir, M. Farach, R.M. Idury, J.A. La Poutré, and A.A Schäffer. Improved dynamic dictionary matching. Information and Computation, 119(2):258-282, 1995. Google Scholar
  8. A. Amir, G.M. Landau, M. Lewenstein, and D. Sokol. Dynamic text and static pattern matching. ACM Transactions on Algorithms, 3(2), 2007. Google Scholar
  9. A. Apostolico. Fast gapped variants for lempel-ziv-welch compression. Information and Computation, 205(7):1012-1026, 2007. Google Scholar
  10. A. Apostolico, D. Breslauer, and Z. Galil. Optimal parallel algorithms for periods, palindromes and squares. In Proc. 19th International Colloquium on Automata, Languages, and Programming (ICALP), volume 623 of LNCS, pages 296-307. Springer, 1992. Google Scholar
  11. A. Apostolico, M. Lewenstein, and P. Erdös. Parameterized matching with mismatches. Journal of Discrete Algorithms, 5(1):135-140, 2007. Google Scholar
  12. B. S. Baker. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences, 52(1):28-42, 1996. Google Scholar
  13. B. S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM Journal on Computing, 26(5):1343-1362, 1997. Google Scholar
  14. B. Balkenhol and S. Kurtz. Universal data compression based on the burrows-wheeler transformation: Theorey and practice. IEEE Trans. on Computers, 49(10):1043-1053, 2000. Google Scholar
  15. G. Benson. Tandem repeats finder: a program to analyze dna sequence. Nucleic Acids Research, 27(2):573-580, 1999. Google Scholar
  16. L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In Proc, 7th Symposium on String Processing and Information Retrieval (SPIRE), pages 39-48, 2000. Google Scholar
  17. P. Bille, A. R. Christiansen, P. H. Cording, I. L. Gørtz, F. R. Skjoldjensen, H. W. Vildhøj, and S. Vind. Dynamic relative compression, dynamic partial sums, and substring concatenation. Algorithmica, 16(4):464-497, 2017. Google Scholar
  18. P. Charalampopoulos, P. Gawrychowski, and K. Pokorski. Dynamic longest common substring in polylogarithmic time. In Proc. 47th International Colloquium on Automata, Languages and Programming (ICALP)@@@, 2020. to appear. Google Scholar
  19. M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. Google Scholar
  20. C. Demetrescu, D. Eppstein, Z. Galil, and G. Italiano. Algorithms and Theory of Computation Handbook, chapter Dynamic Graph Algorithms, pages 9-9. Chapman & Hall/CRC, 2010. URL: http://dl.acm.org/citation.cfm?id=1882757.1882766.
  21. N. O. Domaniç and F. P. Preparata. A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric. Journal of Computational Biology, 14(7):873-891, 2007. Google Scholar
  22. M. Farach and M. Thorup. String matching in lempel-ziv compressed strings. Proc. 27th Annual ACM Symposium on the Theory of Computing, pages 703-712, 1995. Google Scholar
  23. P. Gawrychowski, A. Karczmarz, T. Kociumaka, J. Lacki, and P. Sankowski. Optimal dynamic strings. In Proc. 29th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1509-1528, 2018. Google Scholar
  24. M. Gu, M. Farach, and R. Beigel. An efficient algorithm for dynamic text indexing. Proc. 5th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 697-704, 1994. Google Scholar
  25. C. Hazay, M. Lewenstein, and D. Sokol. Approximate parameterized matching. In Proc. 12th Annual European Symposium on Algorithms (ESA 2004), pages 414-425, 2004. Google Scholar
  26. C. Hohlweg and C. Reutenauer. Lyndon words, permutations and trees. Theoretical Computer Science, 307(1):173-178, 2003. Google Scholar
  27. R.M. Idury and A.A Schäffer. Dynamic dictionary matching with failure functions. Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, pages 273-284, 1992. Google Scholar
  28. H. Kaplan, E. Porat, and N. Shafrir. Finding the position of the k-mismatch and approximate tandem repeats. In Scandinavian Workshop on Algorithm Theory (SWAT06), pages 90-101, 2006. Google Scholar
  29. T. Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, October 2018. URL: https://www.mimuw.edu.pl/~kociumaka/files/phd.pdf.
  30. G. Kucherov and D. Sokol. Approximate Tandem Repeats, pages 106-109. Springer, 2016. Google Scholar
  31. G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In Proc. 4th Annual Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 648, pages 120-133. Springer-Verlag, 1993. Google Scholar
  32. G. M. Landau, J. P. Schmidt, and D. Sokol. An algorithm for approximate tandem repeats. Journal of Computational Biology, 8(1):1-18, 2001. Google Scholar
  33. G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43:239-249, 1986. Google Scholar
  34. G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989. Google Scholar
  35. T. Lee, J.C. Na, and K. Park. On-line construction of parameterized suffix trees. Information Processing Letters, 111(5):201-207, 2011. Google Scholar
  36. J. Loving, J. P. Scaduto, and G. Benson. An SIMD algorithm for wraparound tandem alignment. In Proc. 13th International Symposium on Bioinformatics Research and Applications (ISBRA), pages 140-149, 2017. Google Scholar
  37. U. Manber and G. Myers. Suffix array: A new method for on-line string searches. In Proc. 1st ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 319-327, 1990. Google Scholar
  38. K. Mehlhorn, R. Sundar, and C. Uhrig. Maintaining dynamic sequences under equality-tests in polylogarithmic time. In Proc. 5th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 213-222, 1994. Google Scholar
  39. G. Navarro and M. Raffinot. A general practical approach to pattern matching over ziv-lempel compressed text. In Proc. 10th Annual Symposium on Combinatorial Pattern Matching (CPM 99), pages 14-36, 1999. LNCS 1645. Google Scholar
  40. T. Nishimoto, T. I, S. Inenaga, H. Bannai, and M. Takeda. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In 41st Intl. Symp. Mathematical Foundations of Computer Science (MFCS), volume 58 of LIPIcs, pages 72:1-72:15, 2016. Google Scholar
  41. T. Rauhe S. Alstrup, G. S. Brodal. Pattern matchiong in dynamic texts. In Proc. 11th ACM-SIAM Symposium on Discrete algorithms(SODA), pages 819-828, 2000. Google Scholar
  42. K. Sadakane. A fast algorithm for making suffix arrays and for burrows-wheeler transformation. In Proc. Data Compression Conference (DCC), pages 129-138, 1998. Google Scholar
  43. S. C. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. Proc. 37th FOCS, pages 320-328, 1996. Google Scholar
  44. M. Salson, T. Lecroq, M Léonard, and L. Mouchard. A four-stage algorithm for updating a burrows-wheeler transform. Theoretcial Computer Science, 410(43):4350-4359, 2009. Google Scholar
  45. M. Salson, T. Lecroq, M Léonard, and L. Mouchard. Dynamic extended suffix arrays. J. Discrete Algorithms, 8:241-257, 2010. Google Scholar
  46. D. Sokol, G. Benson, and J. Tojeira. Tandem repeats over the edit distance. Bioinformatics, 23(2):30-35, 2007. Google Scholar
  47. P. Weiner. Linear pattern matching algorithm. Proc. 14 IEEE Symposium on Switching and Automata Theory, pages 1-11, 1973. Google Scholar
  48. T. A. Welch. A technique for high-performance data compression. IEEE Computer, 17:8-19, 1984. Google Scholar
  49. Y. Wexler, Z. Yakhini, Y. Kashi, and D. Geiger. Finding approximate tandem repeats in genomic sequences. Journal of Computational Biology, 12(7):928-942, 2005. Google Scholar
  50. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. on Information Theory, IT-23(3):337-343, 1977. Google Scholar
  51. J. Ziv and A. Lempel. Compression of individual sequences via variable rate coding. IEEE Trans. on Information Theory, IT-24:530-536, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail