A Compact Index for Cartesian Tree Matching

Authors Sung-Hwan Kim, Hwan-Gue Cho



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.18.pdf
  • Filesize: 2.03 MB
  • 19 pages

Document Identifiers

Author Details

Sung-Hwan Kim
  • Pusan National University, South Korea
Hwan-Gue Cho
  • Pusan National University, South Korea

Acknowledgements

We would like to thank anonymous reviewers for their valuable and constructive comments.

Cite As Get BibTex

Sung-Hwan Kim and Hwan-Gue Cho. A Compact Index for Cartesian Tree Matching. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.CPM.2021.18

Abstract

Cartesian tree matching is a recently introduced string matching problem in which two strings match if their corresponding Cartesian trees are the same. It is considered appropriate to find patterns regarding their shapes especially in numerical time series data. While many related problems have been addressed, developing a compact index has received relatively less attention. In this paper, we present a 3n+o(n)-bit index that can count the number of occurrences of a Cartesian tree pattern in 𝒪(m) time where n and m are the text and pattern length. To the best of our knowledge, this work is the first 𝒪(n)-bit compact data structure for indexing for this problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • String Matching
  • Suffix Array
  • FM-index
  • Compact Index
  • Cartesian Tree Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Timothy M. Chan, Moshe Lewenstein, and Noa Lewenstein. On hardness of jumbled indexing. In Proceedings of the 41st International Colloquium on Automata, Languages, and Programming (ICALP), pages 114-125, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_10.
  2. Amihood Amir and Eitan Kondratovsky. Sufficient conditions for efficient indexing under different matchings. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 6:1-12, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.6.
  3. Brenda S. Baker. Parameterized pattern matching: Algorithm and applications. Journal of Computer and System Sciences, 52:28-42, 1996. URL: https://doi.org/10.1006/jcss.1996.0003.
  4. Nieves R. Brisaboa, Susana Ladra, and Gonzalo Navarro. DACs: Bringing direct access to variable-length codes. Information Processing and Management, 49(1):392-404, 2013. URL: https://doi.org/10.1016/j.ipm.2012.08.003.
  5. Vincent Cohen-Addad, Laurent Feuilloley, and Tatiana Starikovskaya. Lower bounds for text indexing with mismatches and differences. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1146-1164, 2019. URL: https://doi.org/10.1137/1.9781611975482.70.
  6. Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffix links. SIAM Journal on Computing, 33(1):26-42, 2003. URL: https://doi.org/10.1137/S0097539701424465.
  7. Maxime Crochemore, Costas S. Iliopoulos, Thierry Lecroq, and Y. J. Pinzon. Approximate string matching in musical sequences. In Proceedings of the Prague Stringology Conference, pages 1-11, 2001. Google Scholar
  8. Maxime Crochemore, Costas S. Iliopulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving indexing. Theoretical Computer Science, 638:122-135, 2016. URL: https://doi.org/10.1016/j.tcs.2015.06.050.
  9. Erik D. Demaine, Gad M. Landau, and Oren Weimann. On cartesian trees and range minimum queries. Algorithmica, 68:610-625, 2014. URL: https://doi.org/10.1007/s00453-012-9683-x.
  10. Simone Faro, Thierry Lecroq, and Kunsoo Park. Fast practical computation of the longest common cartesian substring of two strings. In Proceedings of the Prague Stringology Conference, pages 48-60, 2020. Google Scholar
  11. Paolo Ferragina and Giovanni Manzini. Compression boosting in optimal linear time using the burrows-wheeler transform. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 655-663, 2004. URL: https://dl.acm.org/doi/10.5555/982792.982892.
  12. Luca Foschini, Roberto Grossi, Ankur Gupta, and Jeffery Scott Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Transactions on Algorithms, 2(4):611-639, 2006. URL: https://doi.org/10.1145/1198513.1198521.
  13. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural pattern matching - succinctly. In Proceedings of the 28th International Symposium on Algorithms and Computation (ISAAC), pages 35:1-13, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.35.
  14. Arnab Ganguly, Rahul Shah, and SharmaV. Thankachan. pbwt: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397-407, 2017. URL: https://doi.org/10.1137/1.9781611974782.25.
  15. Paweĺ Gawrychowski, Samah Ghazawi, and Gad M. Landau. On indeterminate string matching. In Proceedings of the 31th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 14:1-14, 2020. URL: https://doi.org/10.4230/LIPIcs.CPM.2020.14.
  16. Alexander Golynski, Roberto Grossi, Ankur Gupta, Rajeev Raman, and Satti Srinivasa Rao. On the size of succinct indices. In Proceedings of the 15th European Symposium on Algorithms (ESA), pages 371-382, 2007. URL: https://doi.org/10.1007/978-3-540-75520-3_34.
  17. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-850, 2003. URL: https://dl.acm.org/doi/10.5555/644108.644250.
  18. Geonmo Gu, Siwoo Song, Simone Faro, Thierry Lecroq, and Kunsoo Park. Fast multiple pattern cartesian tree matching. In Proceedings of the 14th International Workshop on Algorithms and Computations (WALCOM), pages 107-119, 2020. URL: https://doi.org/10.1007/978-3-030-39881-1_10.
  19. Diptarama Hendrian. Generalized dictionary matching under substring consistent equivalence relations. In Proceedings of the 14th International Workshop on Algorithms and Computations (WALCOM), pages 120-132, 2020. URL: https://doi.org/10.1007/978-3-030-39881-1_11.
  20. Juha Kärkkäinen and Simon J. Puglisi. Fixed block compression boosting in fm-indexes. In Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 174-184, 2011. URL: https://doi.org/10.1007/s00453-018-0475-9.
  21. Natsumi Kikuchi, Diptarama Hendrian, Ryo Yoshinaka, and Ayumi Shinohara. Computing covers under substring consistent equivalence relations. In Proceedings of the 27th International Symposium on String Processing and Information Retrieval (SPIRE), pages 131-146, 2020. URL: https://doi.org/10.1007/978-3-030-59212-7_10.
  22. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theoretical Computer Science, 525:68-79, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.006.
  23. Sung-Hwan Kim and Hwan-Gue Cho. Indexing isodirectional pointer sequences. In Proceedings of the 31st International Symposium on Algorithms and Computation (ISAAC 2020), pages 35:1-35:15, 2020. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2020.35.
  24. Sung-Hwan Kim and Hwan-Gue Cho. Simpler fm-index for parameterized string matching. Information Processing Letters, page 106026, 2020. (Online available). URL: https://doi.org/10.1016/j.ipl.2020.106026.
  25. Moshe Lewenstein. Indexing with gaps. In Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 135-143, 2011. URL: https://doi.org/10.1007/978-3-642-24583-1_14.
  26. Veli Mäkinen and Gonzalo Navarro. Implicit compression boosting with applications to self-indexing. In Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), pages 229-241, 2007. URL: https://doi.org/10.1007/978-3-540-75530-2_21.
  27. Gonzalo Navarro. Wavelet trees for all. Journal of Discrete Algorithms, 25:2-20, 2014. URL: https://doi.org/10.1016/j.jda.2013.07.004.
  28. Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian tree matching and indexing. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 16:1-14, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.16.
  29. Tetsuo Shibuya. Generalization of a suffix tree for RNA structural pattern matching. In Proceedings of the 7th Scandinavian Workshop on Algorithm Theory (SWAT), pages 393-406, 2000. URL: https://doi.org/10.1007/s00453-003-1067-9.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail