Indexing Isodirectional Pointer Sequences

Authors Sung-Hwan Kim, Hwan-Gue Cho



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2020.35.pdf
  • Filesize: 1.79 MB
  • 15 pages

Document Identifiers

Author Details

Sung-Hwan Kim
  • Department of Electrical and Computer Engineering, Pusan National University, South Korea
Hwan-Gue Cho
  • Department of Electrical and Computer Engineering, Pusan National University, South Korea

Cite AsGet BibTex

Sung-Hwan Kim and Hwan-Gue Cho. Indexing Isodirectional Pointer Sequences. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 35:1-35:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ISAAC.2020.35

Abstract

Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlgσ+2n+o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in 𝒪(mlgσ) where σ is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*₀(T)+2n+o(n) bits where H^*₀(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • String Matching
  • Suffix Array
  • FM-index
  • Wavelet Tree
  • Range Minimum Query
  • Parameterized String Matching
  • Cartesian Tree Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir and Eitan Kondratovsky. Sufficient conditions for efficient indexing under different matchings. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 6:1-12, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.6.
  2. Brenda S. Baker. Parameterized pattern matching: Algorithm and applications. Journal of Computer and System Sciences, 52:28-42, 1996. URL: https://doi.org/10.1006/jcss.1996.0003.
  3. Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffix links. SIAM Journal on Computing, 33(1):26-42, 2003. URL: https://doi.org/10.1137/S0097539701424465.
  4. Maxime Crochemore, Costas S. Iliopulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving indexing. Theoretical Computer Science, 638:122-135, 2016. URL: https://doi.org/10.1016/j.tcs.2015.06.050.
  5. Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundation of Computer Science (FOCS), pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
  6. Paolo Ferragina and Giovanni Manzini. Compression boosting in optimal linear time using the burrows-wheeler transform. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 655-663, 2004. URL: https://dl.acm.org/doi/10.5555/982792.982892.
  7. Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2):465-492, 2011. URL: https://doi.org/10.1137/090779759.
  8. Arnab Ganguly, Rahul Shah, and SharmaV. Thankachan. pbwt: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397-407, 2017. URL: https://doi.org/10.1137/1.9781611974782.25.
  9. PawełGawrychowski, Seungbum Jo, Shay Mozes, and Oren Weimann. Compressed range minmum queries. Theoretical Computer Science, 812:39-48, 2020. URL: https://doi.org/10.1016/j.tcs.2019.07.002.
  10. Alexander Golynski, Roberto Grossi, Ankur Gupta, Rajeev Raman, and Satti Srinivasa Rao. On the size of succinct indices. In Proceedings of the 15th European Symposium on Algorithms (ESA), pages 371-382, 2007. URL: https://doi.org/10.1007/978-3-540-75520-3_34.
  11. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-850, 2003. URL: https://dl.acm.org/doi/10.5555/644108.644250.
  12. Juha Kärkkäinen and Simon J. Puglisi. Fixed block compression boosting in fm-indexes. In Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 174-184, 2011. URL: https://doi.org/10.1007/s00453-018-0475-9.
  13. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theoretical Computer Science, 525:68-79, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.006.
  14. Sung-Hwan Kim and Hwan-Gue Cho. Simpler fm-index for parameterized string matching. Information Processing Letters, page 106026, 2020. (Online available). URL: https://doi.org/10.1016/j.ipl.2020.106026.
  15. Veli Mäkinen and Gonzalo Navarro. Implicit compression boosting with applications to self-indexing. In Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 229-241, 2007. URL: https://doi.org/10.1007/978-3-540-75530-2_21.
  16. Gonzalo Navarro. Wavelet trees for all. Journal of Discrete Algorithms, 25:2-20, 2014. URL: https://doi.org/10.1016/j.jda.2013.07.004.
  17. Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian tree matching and indexing. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 16:1-14, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.16.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail