Longest Common Extensions with Recompression

Author Tomohiro I



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.18.pdf
  • Filesize: 0.58 MB
  • 15 pages

Document Identifiers

Author Details

Tomohiro I

Cite As Get BibTex

Tomohiro I. Longest Common Extensions with Recompression. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.18

Abstract

Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers.

Subject Classification

Keywords
  • Longest Common Extension (LCE) queries
  • compressed data structure
  • grammar compressed strings
  • Straight-Line Program (SLP)

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael A. Bender, Martin Farach-Colton, Giridhar Pemmasani, Steven Skiena, and Pavel Sumazin. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms, 57(2):75-94, 2005. URL: http://dx.doi.org/10.1016/j.jalgor.2005.08.001.
  2. Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings. In Akash Lal, S. Akshay, Saket Saurabh, and Sandeep Sen, editors, Proceedings of the 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016), volume 65 of LIPIcs, pages 36:1-36:16. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2016.36.
  3. Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Benjamin Sach, Hjalte Wedel Vildhøj, and Søren Vind. Fingerprints in compressed strings. In Frank Dehne, Roberto Solis-Oba, and Jörg-Rüdiger Sack, editors, Proceedings of the 13th International Symposium on Algorithms and Data Structures (WADS 2013), volume 8037 of LNCS, pages 146-157. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-40104-6_13.
  4. Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 65-76. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_6.
  5. Philip Bille, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj. Time-space trade-offs for longest common extensions. J. Discrete Algorithms, 25:42-50, 2014. URL: http://dx.doi.org/10.1016/j.jda.2013.06.003.
  6. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/CBO9780511574931.
  7. Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, and Ayumi Shinohara. Detecting regularities on grammar-compressed strings. Inf. Comput., 240:74-89, 2015. URL: http://dx.doi.org/10.1016/j.ic.2014.09.009.
  8. Artur Jeż. Compressed membership for NFA (DFA) with compressed labels is in NP (P). In Christoph Dürr and Thomas Wilke, editors, Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012), volume 14 of LIPIcs, pages 136-147. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2012. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2012.136.
  9. Artur Jeż. Approximation of grammar-based compression via recompression. Theor. Comput. Sci., 592:115-134, 2015. URL: http://dx.doi.org/10.1016/j.tcs.2015.05.027.
  10. Artur Jeż. Faster fully compressed pattern matching by recompression. ACM Trans. Algorithms, 11(3):20:1-20:43, 2015. URL: http://dx.doi.org/10.1145/2631920.
  11. Artur Jeż. One-variable word equations in linear time. Algorithmica, 74(1):1-48, 2016. URL: http://dx.doi.org/10.1007/s00453-014-9931-3.
  12. Artur Jeż. Recompression: A simple and powerful technique for word equations. J. ACM, 63(1):4, 2016. URL: http://dx.doi.org/10.1145/2743014.
  13. Artur Jeż and Markus Lohrey. Approximation of smallest linear tree grammar. In Ernst W. Mayr and Natacha Portier, editors, Proceedings of the 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), volume 25 of LIPIcs, pages 445-457. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2014.445.
  14. Shirou Maruyama, Masaya Nakahara, Naoya Kishiue, and Hiroshi Sakamoto. Esp-index: A compressed index based on edit-sensitive parsing. J. Discrete Algorithms, 18:100-112, 2013. URL: http://dx.doi.org/10.1016/j.jda.2012.07.009.
  15. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, 1997. URL: http://dx.doi.org/10.1007/BF02522825.
  16. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Dynamic index and LZ factorization in compressed space. In Jan Holub and Jan Žďárek, editors, Proceedings of the Prague Stringology Conference (PSC 2016), pages 158-170, 2016. URL: http://www.stringology.org/event/2016/p14.html.
  17. Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Fully dynamic data structure for LCE queries in compressed space. In Piotr Faliszewski, Anca Muscholl, and Rolf Niedermeier, editors, Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016), volume 58 of LIPIcs, pages 72:1-72:15. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2016.72.
  18. Nicola Prezza. In-place longest common extensions, 2017. URL: http://arxiv.org/abs/1608.05100v9.
  19. Simon J. Puglisi and Andrew Turpin. Space-time tradeoffs for longest-common-prefix array computation. In Seok-Hee Hong, Hiroshi Nagamochi, and Takuro Fukunaga, editors, Proceedings of the 19th International Symposium on Algorithms and Computation (ISAAC 2008), volume 5369 of LNCS, pages 124-135. Springer, 2008. URL: http://dx.doi.org/10.1007/978-3-540-92182-0_14.
  20. Wojciech Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211-222, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00777-6.
  21. Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, and Masayuki Takeda. Deterministic sub-linear space LCE data structures with efficient construction. In Roberto Grossi and Moshe Lewenstein, editors, Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), volume 54 of LIPIcs, pages 1:1-1:10. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.1.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail