Faster Longest Common Extension Queries in Strings over General Alphabets

Authors Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, Tomasz Walen



PDF
Thumbnail PDF

File

LIPIcs.CPM.2016.5.pdf
  • Filesize: 0.56 MB
  • 13 pages

Document Identifiers

Author Details

Pawel Gawrychowski
Tomasz Kociumaka
Wojciech Rytter
Tomasz Walen

Cite As Get BibTex

Pawel Gawrychowski, Tomasz Kociumaka, Wojciech Rytter, and Tomasz Walen. Faster Longest Common Extension Queries in Strings over General Alphabets. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.CPM.2016.5

Abstract

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q log log n + n log* n) time making only O(q + n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabets can be computed in O(n log log n) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who designed an algorithm with O(n log^⅔ n) running time and conjectured that O(n) time is possible. Our paper makes a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(q log n + n log* n). The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

Subject Classification

Keywords
  • longest common extension
  • longest common prefix
  • maximal repetitions
  • difference cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. A new characterization of maximal repetitions by Lyndon trees. In Piotr Indyk, editor, 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 562-571. SIAM, 2015. URL: http://dx.doi.org/10.1137/1.9781611973730.38.
  2. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The "runs" theorem, 2015. URL: http://arxiv.org/abs/1406.0263v7.
  3. Philip Bille, Paweł Gawrychowski, Inge Li Gørtz, Gad M. Landau, and Oren Weimann. Longest common extensions in trees. Theor. Comput. Sci., 2015. In press. URL: http://dx.doi.org/10.1016/j.tcs.2015.08.009.
  4. Philip Bille, Inge Li Gørtz, Mathias Bæk Tejs Knudsen, Moshe Lewenstein, and Hjalte Wedel Vildhøj. Longest common extensions in sublinear space. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Combinatorial Pattern Matching, CPM 2015, volume 9133 of LNCS, pages 65-76. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_6.
  5. Philip Bille, Inge Li Gørtz, Benjamin Sach, and Hjalte Wedel Vildhøj. Time-space trade-offs for longest common extensions. J. Discrete Algorithms, 25:42-50, 2014. URL: http://dx.doi.org/10.1016/j.jda.2013.06.003.
  6. Dany Breslauer. Efficient String Algorithmics. PhD thesis, Columbia University, 1992. URL: http://www.cs.columbia.edu/~library/theses/breslauer.ps.gz.
  7. Stefan Burkhardt and Juha Kärkkäinen. Fast lightweight suffix array construction and checking. In Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore, editors, Combinatorial Pattern Matching, CPM 2003, volume 2676 of LNCS, pages 55-69. Springer, 2003. URL: http://dx.doi.org/10.1007/3-540-44888-8_5.
  8. Paweł Gawrychowski, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Universal reconstruction of a string. In Frank Dehne, Jörg-Rüdiger Sack, and Ulrike Stege, editors, Algorithms and Data Structures, WADS 2015, volume 9214 of LNCS, pages 386-397. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21840-3_32.
  9. Shunsuke Inenaga. A faster longest common extension algorithm on compressed strings and its applications. In Jan Holub and Jan Žďárek, editors, Prague Stringology Conference 2015, pages 1-4. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2015. URL: http://www.stringology.org/event/2015/p01.html.
  10. Marek Karpiński, Wojciech Rytter, and Ayumi Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nord. J. Comput., 4(2):172-186, 1997. Google Scholar
  11. Dmitry Kosolobov. Finding the leftmost critical factorization on unordered alphabet, 2015. URL: http://arxiv.org/abs/1509.01018.
  12. Dmitry Kosolobov. Lempel-Ziv factorization may be harder than computing all runs. In Ernst W. Mayr and Nicolas Ollinger, editors, Theoretical Aspects of Computer Science, STACS 2015, volume 30 of LIPIcs, pages 582-593. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2015.582.
  13. Dmitry Kosolobov. Online detection of repetitions with backtracking. In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Combinatorial Pattern Matching, CPM 2015, volume 9133 of LNCS, pages 295-306. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_25.
  14. Dmitry Kosolobov. Computing runs on a general alphabet. Information Processing Letters, 116(3):241-244, 2016. URL: http://dx.doi.org/10.1016/j.ipl.2015.11.016.
  15. Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. J. Algorithms, 10(2):157-169, 1989. URL: http://dx.doi.org/10.1016/0196-6774(89)90010-2.
  16. Yury Lifshits. Processing compressed texts: A tractability border. In Bin Ma and Kaizhong Zhang, editors, Combinatorial Pattern Matching, CPM 2007, volume 4580 of LNCS, pages 228-240. Springer, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73437-6_24.
  17. Mamoru Maekawa. A √n algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst., 3(2):145-159, May 1985. URL: http://dx.doi.org/10.1145/214438.214445.
  18. Michael G. Main and Richard J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms, 5(3):422-432, 1984. URL: http://dx.doi.org/10.1016/0196-6774(84)90021-X.
  19. Masamichi Miyazaki, Ayumi Shinohara, and Masayuki Takeda. An improved pattern matching algorithm for strings in terms of straight-line programs. In Alberto Apostolico and Jotun Hein, editors, Combinatorial Pattern Matching, CPM 1997, volume 1264 of LNCS, pages 1-11. Springer, 1997. URL: http://dx.doi.org/10.1007/3-540-63220-4_45.
  20. Robert E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215-225, April 1975. URL: http://dx.doi.org/10.1145/321879.321884.
  21. Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. J. ACM, 31(2):245-281, 1984. URL: http://dx.doi.org/10.1145/62.2160.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail