Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences

Authors Satoshi Kobayashi, Diptarama Hendrian , Ryo Yoshinaka , Ayumi Shinohara



PDF
Thumbnail PDF

File

LIPIcs.SEA.2020.13.pdf
  • Filesize: 0.56 MB
  • 13 pages

Document Identifiers

Author Details

Satoshi Kobayashi
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Diptarama Hendrian
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Ryo Yoshinaka
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan
Ayumi Shinohara
  • Graduate School of Information Sciences, Tohoku University, Sendai, Japan

Cite AsGet BibTex

Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Fast and Linear-Time String Matching Algorithms Based on the Distances of q-Gram Occurrences. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 13:1-13:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SEA.2020.13

Abstract

Given a text T of length n and a pattern P of length m, the string matching problem is a task to find all occurrences of P in T. In this study, we propose an algorithm that solves this problem in O((n + m)q) time considering the distance between two adjacent occurrences of the same q-gram contained in P. We also propose a theoretical improvement of it which runs in O(n + m) time, though it is not necessarily faster in practice. We compare the execution times of our and existing algorithms on various kinds of real and artificial datasets such as an English text, a genome sequence and a Fibonacci string. The experimental results show that our algorithm is as fast as the state-of-the-art algorithms in many cases, particularly when a pattern frequently appears in a text.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • String matching algorithm
  • text processing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Cyril Allauzen, Maxime Crochemore, and Mathieu Raffinot. Factor oracle: A new structure for pattern matching. In Jan Pavelka, Gerard Tel, and Miroslav Bartošek, editors, SOFSEM'99: Theory and Practice of Informatics, pages 295-310. Springer Berlin Heidelberg, 1999. Google Scholar
  2. Ross Arnold and Tim Bell. A corpus for the evaluation of lossless compression algorithms. In Proceedings of DCC '97. Data Compression Conference, pages 201-210, 1997. Google Scholar
  3. Robert S. Boyer and J. Strother Moore. A fast string searching algorithm. Commun. ACM, 20(10):762-772, 1977. URL: https://doi.org/10.1145/359842.359859.
  4. Domenico Cantone and Simone Faro. Searching for a substring with constant extra-space complexity. In Proceedings of Third International Conference on Fun with algorithms, pages 118-131, 2004. Google Scholar
  5. Domenico Cantone and Simone Faro. Fast-search algorithms: New efficient variants of the Boyer-Moore pattern-matching algorithm. Journal of Automata, Languages and Combinatorics, 10:589-608, 2005. URL: https://doi.org/10.1007/3-540-44867-5_4.
  6. Domenico Cantone and Simone Faro. Improved and self-tuned occurrence heuristics. Journal of Discrete Algorithms, 28:73-84, 2014. URL: https://doi.org/10.1016/j.jda.2014.07.006.
  7. Domenico Cantone, Simone Faro, and Emanuele Giaquinta. A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Information and Computation, 213:3-12, 2012. Special Issue: Combinatorial Pattern Matching (CPM 2010). URL: https://doi.org/10.1016/j.ic.2011.03.006.
  8. Domenico Cantone, Simone Faro, and Arianna Pavone. Linear and Efficient String Matching Algorithms Based on Weak Factor Recognition. Journal of Experimental Algorithmics, 24(1):1-20, 2019. URL: https://doi.org/10.1145/3301295.
  9. Simone Faro. A very fast string matching algorithm based on condensed alphabets. In Riccardo Dondi, Guillaume Fertin, and Giancarlo Mauri, editors, Algorithmic Aspects in Information and Management - 11th International Conference, AAIM 2016, Bergamo, Italy, July 18-20, 2016, Proceedings, volume 9778 of Lecture Notes in Computer Science, pages 65-76. Springer, 2016. URL: https://doi.org/10.1007/978-3-319-41168-2_6.
  10. Simone Faro and Thierry Lecroq. A fast suffix automata based algorithm for exact online string matching. In Nelma Moreira and Rogério Reis, editors, Implementation and Application of Automata, pages 149-158. Springer Berlin Heidelberg, 2012. Google Scholar
  11. Simone Faro and Thierry Lecroq. A multiple sliding windows approach to speed up string matching algorithms. In Ralf Klasing, editor, Experimental Algorithms, pages 172-183. Springer Berlin Heidelberg, 2012. Google Scholar
  12. Simone Faro and Thierry Lecroq. The exact online string matching problem. ACM Computing Surveys, 45(2):1-42, 2013. URL: https://doi.org/10.1145/2431211.2431212.
  13. Simone Faro, Thierry Lecroq, Stefano Borzì, Simone Di Mauro, and Alessandro Maggio. The string matching algorithms research tool. In Jan Holub and Jan Žďárek, editors, Proceedings of the Prague Stringology Conference 2016, pages 99-113, Czech Technical University in Prague, Czech Republic, 2016. Google Scholar
  14. Frantisek Franek, Christopher G. Jennings, and W.F. Smyth. A simple fast hybrid pattern-matching algorithm. Journal of Discrete Algorithms, 5(4):682-695, 2007. URL: https://doi.org/10.1016/J.JDA.2006.11.004.
  15. Saqib I. Hakak, Amirrudin Kamsin, Palaiahnakote Shivakumara, Gulshan A. Gilkar, Wazir Z. Khan, and Muhammad Imran. Exact string matching algorithms: Survey, issues, and future research directions. IEEE Access, 7:69614-69637, 2019. Google Scholar
  16. R. Nigel Horspool. Practical fast searching in strings. Software: Practice and Experience, 10(6):501-506, 1980. URL: https://doi.org/10.1002/spe.4380100608.
  17. Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, 1977. URL: https://doi.org/10.1137/0206024.
  18. Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. An improvement of the Franek-Jennings-Smyth pattern matching algorithm. In Proceedings of the Prague Stringology Conference 2019, pages 56-68, 2019. Google Scholar
  19. Thierry Lecroq. Fast exact string matching algorithms. Information Processing Letters, 102(6):229-235, 2007. URL: https://doi.org/10.1016/j.ipl.2007.01.002.
  20. Gonzalo Navarro and Mathieu Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. In Martin Farach-Colton, editor, Combinatorial Pattern Matching, pages 14-33. Springer Berlin Heidelberg, 1998. Google Scholar
  21. Daniel M. Sunday and Daniel M. A very fast substring search algorithm. Communications of the ACM, 33(8):132-142, 1990. URL: https://doi.org/10.1145/79173.79184.
  22. Sun Wu and Udi Manber. A fast algorithm for multi-pattern searching. Technical Report TR-94-17, Department of Computer Science, Chung-Cheng University, 1994. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail