LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching

Authors Arnab Ganguly, Dhrumil Patel, Rahul Shah, Sharma V. Thankachan



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.71.pdf
  • Filesize: 0.97 MB
  • 19 pages

Document Identifiers

Author Details

Arnab Ganguly
  • Department of Computer Science, University of Wisconsin - Whitewater, WI, USA
Dhrumil Patel
  • School of EECS, Louisiana State University, Baton Rouge, LA, USA
Rahul Shah
  • School of EECS, Louisiana State University, Baton Rouge, LA, USA
Sharma V. Thankachan
  • Department of Computer Science, University of Central Florida, Orlando, FL, USA

Cite AsGet BibTex

Arnab Ganguly, Dhrumil Patel, Rahul Shah, and Sharma V. Thankachan. LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 71:1-71:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.71

Abstract

Two strings are order isomorphic iff the relative ordering of their characters is the same at all positions. For a given text T[1,n] over an ordered alphabet of size σ, we can maintain an order-isomorphic suffix tree/array in O(nlog n) bits and support (order-isomorphic) pattern/substring matching queries efficiently. It is interesting to know if we can encode these structures in space close to the text’s size of nlogσ bits. We answer this question positively by presenting an O(nlog σ)-bit index that allows access to any entry in order-isomorphic suffix array (and its inverse array) in t_{SA} = {O}(log²n/logσ) time. For any pattern P given as a query, this index can count the number of substrings of T that are order-isomorphic to P (denoted by occ) in {O}((|P|logσ+t_{SA})log n) time using standard techniques. Also, it can report the locations of those substrings in additional O(occ ⋅ t_{SA}) time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Succinct data structures
  • Pattern Matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA, pages 71-80, 1993. URL: https://doi.org/10.1145/167088.167115.
  2. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm, 1994. Google Scholar
  3. Domenico Cantone, Simone Faro, and M. Oguzhan Külekci. The order-preserving pattern matching problem in practice. Discret. Appl. Math., 274:11-25, 2020. URL: https://doi.org/10.1016/j.dam.2018.10.023.
  4. Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffix links. SIAM J. Comput., 33(1):26-42, 2003. An extended abstract appeared in STOC 2000. URL: https://doi.org/10.1137/S0097539701424465.
  5. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Order-preserving indexing. Theor. Comput. Sci., 638:122-135, 2016. URL: https://doi.org/10.1016/j.tcs.2015.06.050.
  6. Gianni Decaroli, Travis Gagie, and Giovanni Manzini. A compact index for order-preserving pattern matching. In 2017 Data Compression Conference, DCC 2017, Snowbird, UT, USA, April 4-7, 2017, pages 72-81, 2017. URL: https://doi.org/10.1109/DCC.2017.35.
  7. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. An extended abstract appeared in FOCS 2000 under the title "Opportunistic Data Structures with Applications". URL: https://doi.org/10.1145/1082036.1082039.
  8. Travis Gagie, Giovanni Manzini, and Rossano Venturini. An encoding for order-preserving matching. In 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria, pages 38:1-38:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ESA.2017.38.
  9. Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. A framework for designing space-efficient dictionaries for parameterized and order-preserving matching. Theor. Comput. Sci., 854:52-62, 2021. URL: https://doi.org/10.1016/j.tcs.2020.11.036.
  10. Arnab Ganguly, Rahul Shah, and Sharma V Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 397-407. Society for Industrial and Applied Mathematics, 2017. Google Scholar
  11. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural pattern matching - succinctly. In Yoshio Okamoto and Takeshi Tokuyama, editors, 28th International Symposium on Algorithms and Computation, ISAAC 2017, December 9-12, 2017, Phuket, Thailand, volume 92 of LIPIcs, pages 35:1-35:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.35.
  12. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 841-850, 2003. Google Scholar
  13. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005. An extended abstract appeared in STOC 2000. URL: https://doi.org/10.1137/S0097539702402354.
  14. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theor. Comput. Sci., 525:68-79, 2014. URL: https://doi.org/10.1016/j.tcs.2013.10.006.
  15. Sung-Hwan Kim and Hwan-Gue Cho. Simpler fm-index for parameterized string matching. Inf. Process. Lett., 165:106026, 2021. URL: https://doi.org/10.1016/j.ipl.2020.106026.
  16. Marcin Kubica, Tomasz Kulczyński, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for consecutive permutation pattern matching. Information Processing Letters, 113(12):430-433, 2013. Google Scholar
  17. Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: https://doi.org/10.1137/0222058.
  18. Juan Mendivelso, Sharma V. Thankachan, and Yoan J. Pinzón. A brief history of parameterized matching problems. Discret. Appl. Math., 274:103-115, 2020. URL: https://doi.org/10.1016/j.dam.2018.07.017.
  19. Temma Nakamura, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Order preserving pattern matching on trees and dags. In Gabriele Fici, Marinella Sciortino, and Rossano Venturini, editors, String Processing and Information Retrieval - 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings, volume 10508 of Lecture Notes in Computer Science, pages 271-277. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-67428-5_23.
  20. Gonzalo Navarro. Compact data structures: A practical approach. Cambridge University Press, 2016. Google Scholar
  21. Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. ACM Trans. Algorithms, 10(3):16:1-16:39, 2014. An extended abstract appeared in SODA 2010. URL: https://doi.org/10.1145/2601073.
  22. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst., 41(4):589-607, 2007. URL: https://doi.org/10.1007/s00224-006-1198-x.
  23. Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail