Simple Order-Isomorphic Matching Index with Expected Compact Space

Authors Sung-Hwan Kim, Hwan-Gue Cho



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2022.61.pdf
  • Filesize: 2.03 MB
  • 17 pages

Document Identifiers

Author Details

Sung-Hwan Kim
  • Pusan National University, Busan, South Korea
Hwan-Gue Cho
  • Pusan National University, Busan, South Korea

Acknowledgements

The authors would like to thank anonymous reviewers for their helpful comments.

Cite As Get BibTex

Sung-Hwan Kim and Hwan-Gue Cho. Simple Order-Isomorphic Matching Index with Expected Compact Space. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 61:1-61:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ISAAC.2022.61

Abstract

In this paper, we present a novel indexing method for the order-isomorphic pattern matching problem (also known as order-preserving pattern matching, or consecutive permutation matching), in which two equal-length strings are defined to match when X[i] < X[j] iff Y[i] < Y[j] for 0 ≤ i,j < |X|. We observe an interesting relation between the order-isomorphic matching and the insertion process of a binary search tree, based on which we propose a data structure which not only has a concise structure comprised of only two wavelet trees but also provides a surprisingly simple searching algorithm. In the average case analysis, the proposed method requires 𝒪(R(T)) bits, and it is capable of answering a count query in 𝒪(R(P)) time, and reporting an occurrence in 𝒪(lg |T|) time, where T and P are the text and the pattern string, respectively; for a string X, R(X) is the total time taken for the construction of the binary search tree by successively inserting the keys X[|X|-1],⋯,X[0] at the root, and its expected value is 𝒪(|X|lgσ) where σ is the alphabet size. Furthermore, the proposed method can be viewed as a generalization of some other methods including several heuristics and restricted versions described in previous studies in the literature.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Compact Data Structure
  • String Matching
  • Order-Preserving Matching
  • Suffix Array
  • FM-index
  • Binary Search Tree

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir and Eitan Kondratovsky. Sufficient Conditions for Efficient Indexing Under Different Matchings. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 6:1-6:12, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.6.
  2. Brenda S. Baker. Parameterized Pattern Matching: Algorithm and Applications. Journal of Computer and System Sciences, 52:28-42, 1996. URL: https://doi.org/10.1006/jcss.1996.0003.
  3. Tamanna Chhabra, M. Oguzhan Külekci, and Jorma Tarhio. Alternative Algorithms for Order-Preserving Matching. In Proceedings of the Prague Stringology Conference, pages 36-46, 2015. Google Scholar
  4. David Clark. Compact Pat Trees. PhD thesis, University of Waterloo, 1996. Google Scholar
  5. Richard Cole and Ramesh Hariharan. Faster Suffix Tree Construction with Missing Suffix Links. In Proceedings of the 32nd Annual Symposium on Theory of Computing (STOC), pages 407-415, 2000. URL: https://doi.org/10.1145/335305.335352.
  6. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, and Solon P. Pissis. Order-Preserving Incomplete Suffix Trees and Order-Preserving Indexes. In Proceedings of the 20th International Symposium on String Processing and Information Retrieval (SPIRE), pages 84-95, 2013. URL: https://doi.org/10.1007/978-3-319-02432-5_13.
  7. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving Indexing. Theoretical Computer Science, 628:122-135, 2016. URL: https://doi.org/10.1016/j.tcs.2015.06.050.
  8. Gianni Decaroli, Travis Gagie, and Giovanni Manzini. A Compact Index for Order-preserving Pattern Matching. Software: Practice and Experience, 46(6):1041-1051, 2019. Google Scholar
  9. Paolo Ferragina and Giovanni Manzini. Opportunistic Data Structures with Applications. In Proceedings of the 41st Annual Symposium on Foundation of Computer Science (FOCS), pages 390-398, 2000. URL: https://doi.org/10.1109/SFCS.2000.892127.
  10. Travis Gagie, Giovanni Manzini, and Rossano Venturini. An Encoding for Order-Preserving Matching. In Proceedings of the 25th European Symposium on Algorithms (ESA), pages 38:1-38:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.38.
  11. Arnab Ganguly, Dhrumil Patel, Rahul Shah, and Sharma V. Thankachan. LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching. In Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP), pages 71:1-71:19, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.71.
  12. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural Pattern Matching - Succinctly. In Proceedings of the 28th International Symposium on Algorithms and Computation (ISAAC), pages 35:1-35:13, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.35.
  13. Arnab Ganguly, Rahul Shah, and SharmaV. Thankachan. pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 397-407, 2017. URL: https://doi.org/10.1137/1.9781611974782.25.
  14. G. Jacobson. Space-efficient Static Trees and Graphs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554, 1989. URL: https://doi.org/10.1109/SFCS.1989.63533.
  15. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving Matching. Theoretical Computer Science, 525:68-79, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.006.
  16. Sung-Hwan Kim and Hwan-Gue Cho. Indexing Isodirectional Pointer Sequences. In Proceedings of the 31st International Symposium on Algorithms and Computation (ISAAC), pages 35:1-35:15, 2020. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2020.35.
  17. Sung-Hwan Kim and Hwan-Gue Cho. A Compact Index for Cartesian Tree Matching. In Proceedings of the 32nd Annual Symposium on Combinatorial Pattern Matching (CPM), pages 18:1-18:19, 2021. URL: https://doi.org/10.4230/LIPIcs.CPM.2021.18.
  18. Sung-Hwan Kim and Hwan-Gue Cho. Simpler FM-index for Parameterized String Matching. Information Processing Letters, page 106026, 2021. URL: https://doi.org/10.1016/j.ipl.2020.106026.
  19. M. Kubica, T. Kulczyński, J. Radoszewski, W. Rytter, and T. Waleń. A Linear Time Algorithm for Consecutive Permutation Pattern Matching. Information Processing Letters, 113(12):430-433, 2013. URL: https://doi.org/10.1016/j.ipl.2013.03.015.
  20. Gonzalo Navarro. Wavelet Trees for All. Journal of Discrete Algorithms, 25:2-20, 2014. URL: https://doi.org/10.1016/j.jda.2013.07.004.
  21. Sung Gwan Park, Amihood Amir, Gad M. Landau, and Kunsoo Park. Cartesian Tree Matching and Indexing. In Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 16:1-14, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.16.
  22. Bruce Reed. The Height of a Random Binary Search Tree. Journal of the ACM, 50(3):306-332, 2003. URL: https://doi.org/10.1145/765568.765571.
  23. Tetsuo Shibuya. Generalization of a Suffix Tree for RNA Structural Pattern Matching. In Proceedings of the 7th Scandinavian Workshop on Algorithm Theory (SWAT), pages 393-406, 2000. URL: https://doi.org/10.5555/645900.672451.
  24. C. J. Stephenson. A Method for Constructing Binary Search Trees by Making Insertions at the Root. International Journal of Computer & Information Science, 9:15-29, 1980. URL: https://doi.org/10.1007/BF00995807.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail