PalFM-Index: FM-Index for Palindrome Pattern Matching

Authors Shinya Nagashita, Tomohiro I



PDF
Thumbnail PDF

File

LIPIcs.CPM.2023.23.pdf
  • Filesize: 0.94 MB
  • 15 pages

Document Identifiers

Author Details

Shinya Nagashita
  • Kyushu Institute of Technology, Fukuoka, Japan
Tomohiro I
  • Kyushu Institute of Technology, Fukuoka, Japan

Cite AsGet BibTex

Shinya Nagashita and Tomohiro I. PalFM-Index: FM-Index for Palindrome Pattern Matching. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CPM.2023.23

Abstract

The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings x and y of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible 1 ≤ i < j ≤ |x| = |y|, x[i..j] is a palindrome if and only if y[i..j] is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text T of length n over an alphabet of size σ, an index for pal-matching is to support, given a pattern P of length m, the counting queries that compute the number occ of occurrences of P and the locating queries that compute the occurrences of P. The authors in [I et al., Theor. Comput. Sci., 2013] proposed an O(n lg n)-bit data structure to support the counting queries in O(m lg σ) time and the locating queries in O(m lg σ + occ) time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies 2n lg min(σ, lg n) + 2n + o(n) bits of space and supports the counting queries in O(m) time. The PalFM-indexes can support the locating queries in O(m + Δ occ) time by adding n/Δ lg n + n + o(n) bits of space, where Δ is a parameter chosen from {1, 2, … , n} in the preprocessing phase.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Palindrome matching
  • Generalized string pattern matching
  • Indexing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jean-Paul Allouche, Michael Baake, Julien Cassaigne, and David Damanik. Palindrome complexity. Theor. Comput. Sci., 292(1):9-31, 2003. Google Scholar
  2. Mira-Cristiana Anisiu, Valeriu Anisiu, and Zoltán Kása. Total palindrome complexity of finite words. Discrete Mathematics, 310(1):109-114, 2010. URL: https://doi.org/10.1016/j.disc.2009.08.002.
  3. Kirill Borozdin, Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur. Palindromic length in linear time. In Proc. 28th Annual Symposium on Combinatorial Pattern Matching (CPM) 2017, pages 23:1-23:12, 2017. URL: https://doi.org/10.4230/LIPIcs.CPM.2017.23.
  4. Srecko Brlek, Sylvie Hamel, Maurice Nivat, and Christophe Reutenauer. On the palindromic complexity of infinite words. Int. J. Found. Comput. Sci., 15(2):293-306, 2004. URL: https://doi.org/10.1142/S012905410400242X.
  5. Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm. Technical report, HP Labs, 1994. Google Scholar
  6. Xavier Droubay, Jacques Justin, and Giuseppe Pirillo. Episturmian words and some constructions of de luca and rauzy. Theor. Comput. Sci., 255(1-2):539-553, 2001. URL: https://doi.org/10.1016/S0304-3975(99)00320-5.
  7. Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In FOCS, pages 390-398, 2000. Google Scholar
  8. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms, 3(2), 2007. Google Scholar
  9. Gabriele Fici, Travis Gagie, Juha Kärkkäinen, and Dominik Kempa. A subquadratic algorithm for minimum palindromic factorization. Journal of Discrete Algorithms, 28:41-48, 2014. StringMasters 2012 & 2013 Special Issue (Volume 1). URL: https://doi.org/10.1016/j.jda.2014.08.001.
  10. Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput., 40(2):465-492, 2011. Google Scholar
  11. Travis Gagie, Giovanni Manzini, and Rossano Venturini. An encoding for order-preserving matching. In Proc. 25th Annual European Symposium on Algorithms (ESA) 2017, pages 38:1-38:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.38.
  12. Zvi Galil and Joel I. Seiferas. A linear-time on-line recognition algorithm for "palstar". J. ACM, 25(1):102-111, 1978. URL: https://doi.org/10.1145/322047.322056.
  13. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proc. 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2017, pages 397-407, 2017. URL: https://doi.org/10.1137/1.9781611974782.25.
  14. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural pattern matching - succinctly. In Proc. 28th International Symposium on Algorithms and Computation (ISAAC) 2017, pages 35:1-35:13, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.35.
  15. Amy Glen, Jacques Justin, Steve Widmer, and Luca Q. Zamboni. Palindromic richness. Eur. J. Comb., 30(2):510-531, 2009. URL: https://doi.org/10.1016/j.ejc.2008.04.006.
  16. Alexander Golynski, Rajeev Raman, and S. Srinivasa Rao. On the redundancy of succinct data structures. In Joachim Gudmundsson, editor, Proc. 11th Scandinavian Workshop on Algorithm Theory (SWAT) 2008, volume 5124 of Lecture Notes in Computer Science, pages 148-159. Springer, 2008. Google Scholar
  17. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2003, pages 841-850. ACM/SIAM, 2003. Google Scholar
  18. Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Counting and verifying maximal palindromes. In Proc. 17th International Symposium on String Processing and Information Retrieval (SPIRE) 2010, pages 135-146, 2010. Google Scholar
  19. Tomohiro I, Shunsuke Inenaga, and Masayuki Takeda. Palindrome pattern matching. Theor. Comput. Sci., 483:162-170, 2013. URL: https://doi.org/10.1016/j.tcs.2012.01.047.
  20. Tomohiro I, Shiho Sugimoto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing palindromic factorizations and palindromic covers on-line. In Proc. 25th Annual Symposium on Combinatorial Pattern Matching (CPM) 2014, volume 8486 of Lecture Notes in Computer Science, pages 150-161. Springer, 2014. Google Scholar
  21. Ignacio Tinoco Jr., Olke C. Uhlenbeck, and Mark D. Levine. Estimation of secondary structure in ribonucleic acids. Nature, 230:362-367, 1971. Google Scholar
  22. Sung-Hwan Kim and Hwan-Gue Cho. A compact index for cartesian tree matching. In Pawel Gawrychowski and Tatiana Starikovskaya, editors, Proc. 32nd Annual Symposium on Combinatorial Pattern Matching (CPM) 2021, volume 191 of LIPIcs, pages 18:1-18:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  23. Sung-Hwan Kim and Hwan-Gue Cho. Simpler FM-index for parameterized string matching. Inf. Process. Lett., 165:106026, 2021. URL: https://doi.org/10.1016/j.ipl.2020.106026.
  24. Donald E. Knuth, James H. Morris, and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977. Google Scholar
  25. Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur. Pal k is linear recognizable online. In SOFSEM 2015: Theory and Practice of Computer Science - 41st International Conference on Current Trends in Theory and Practice of Computer Science, Pec pod Sněžkou, Czech Republic, January 24-29, 2015. Proceedings, pages 289-301, 2015. URL: https://doi.org/10.1007/978-3-662-46078-8_24.
  26. Glenn K. Manacher. A new linear-time "on-line" algorithm for finding the smallest initial palindrome of a string. J. ACM, 22(3):346-351, 1975. URL: https://doi.org/10.1145/321892.321896.
  27. Yoshiaki Matsuoka, Takahiro Aoki, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Generalized pattern matching and periodicity under substring consistent equivalence relations. Theor. Comput. Sci., 656:225-233, 2016. Google Scholar
  28. Antonio Restivo and Giovanna Rosone. Burrows-wheeler transform and palindromic richness. Theor. Comput. Sci., 410(30-32):3018-3026, 2009. URL: https://doi.org/10.1016/j.tcs.2009.03.008.
  29. Mikhail Rubinchik and Arseny M. Shur. EERTREE: an efficient data structure for processing palindromes in strings. Eur. J. Comb., 68:249-265, 2018. URL: https://doi.org/10.1016/j.ejc.2017.07.021.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail