An Encoding for Order-Preserving Matching

Authors Travis Gagie, Giovanni Manzini, Rossano Venturini



PDF
Thumbnail PDF

File

LIPIcs.ESA.2017.38.pdf
  • Filesize: 465 kB
  • 15 pages

Document Identifiers

Author Details

Travis Gagie
Giovanni Manzini
Rossano Venturini

Cite AsGet BibTex

Travis Gagie, Giovanni Manzini, and Rossano Venturini. An Encoding for Order-Preserving Matching. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 38:1-38:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ESA.2017.38

Abstract

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: e.g., (4, 1, 3, 2) and (10, 3, 7, 5) are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size sigma and a constant c >=1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m >= log^c n, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if log(sigma) = Omega(log log n); our query time is optimal if log(sigma) = Omega(log n). Our space bound contrasts with the Omega(n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(log^c n) neighbouring characters.
Keywords
  • Compact data structures
  • encodings
  • order-preserving matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Brenda S. Baker. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences, 52(1):28-42, 1996. Google Scholar
  2. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In Proceedings of the Symposium on Discrete Algorithms (SODA), pages 785-794. SIAM, 2009. Google Scholar
  3. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Theory and practise of monotone minimal perfect hashing. In Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), pages 132-144. SIAM, 2009. Google Scholar
  4. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Fast prefix search in little space, with applications. In Proceedings of the European Symposium on Algorithms (ESA), pages 427-438. Springer, 2010. Google Scholar
  5. Djamal Belazzougui, Adeline Pierrot, Mathieu Raffinot, and Stéphane Vialette. Single and multiple consecutive permutation motif search. In Proceedings of the International Symposium on Algorithms and Computation (ISAAC), pages 66-77. Springer, 2013. Google Scholar
  6. Domenico Cantone, Simone Faro, and M. Oǧuzhan Külekci. An efficient skip-search approach to the order-preserving pattern matching problem. In Proceedings of the Prague Stringology Conference (PSC), pages 22-35. Department of Theoretical Computer Science, Czech Technical University in Prague, 2015. Google Scholar
  7. Tamanna Chhabra, Simone Faro, M. Oğuzhan Külekci, and Jorma Tarhio. Engineering order-preserving pattern matching with SIMD parallelism. Software: Practice and Experience, 2016. Google Scholar
  8. Tamanna Chhabra, Emanuele Giaquinta, and Jorma Tarhio. Filtration algorithms for approximate order-preserving matching. In Proceedings of the Symposium on String Processing and Information Retrieval (SPIRE), pages 177-187. Springer, 2015. Google Scholar
  9. Tamanna Chhabra, M. Oǧuzhan Külekci, and Jorma Tarhio. Alternative algorithms for order-preserving matching. In Proceedings of the Prague Stringology Conference (PSC), pages 36-46. Department of Theoretical Computer Science, Czech Technical University in Prague, 2015. Google Scholar
  10. Tamanna Chhabra and Jorma Tarhio. Order-preserving matching with filtration. In Proceedings of the Symposium on Experimental Algorithms (SEA), pages 307-314. Springer, 2014. Google Scholar
  11. Tamanna Chhabra and Jorma Tarhio. A filtration method for order-preserving matching. Information Processing Letters, 116(2):71-74, 2016. Google Scholar
  12. Sukhyeun Cho, Joong Chae Na, Kunsoo Park, and Jeong Seop Sim. A fast algorithm for order-preserving pattern matching. Information Processing Letters, 115(2):397-402, 2015. Google Scholar
  13. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving indexing. Theoretical Computer Science, 638:122-135, 2016. Google Scholar
  14. Pooya Davoodi, Gonzalo Navarro, Rajeev Raman, and S. Srinivasa Rao. Encoding range minima and range top-2 queries. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 372(2016):20130131, 2014. Google Scholar
  15. Gianni Decaroli, Travis Gagie, and Giovanni Manzini. A compact index for order-preserving pattern matching. In Proceedings of the Data Compression Conference (DCC), 2017. Google Scholar
  16. Peter Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM, 21(2):246-260, 1974. Google Scholar
  17. Robert M. Fano. On the number of bits required to implement an associative memory. Technical Report Memorandum 61, Project MAC, Computer Structures Group, Massachusetts Institute of Technology, 1971. Google Scholar
  18. Simone Faro and M. Oğuzhan Külekci. Efficient algorithms for the order preserving pattern matching problem. In Proceedings of the Conference on Algorithmic Applications in Management (AAIM), pages 185-196. Springer, 2016. Google Scholar
  19. Paolo Ferragina and Giovanni Manzini. An experimental study of a compressed index. Information Sciences, 135(1):13-28, 2001. Google Scholar
  20. Johannes Fischer. Combined data structure for previous-and next-smaller-values. Theoretical Computer Science, 412(22):2451-2456, 2011. Google Scholar
  21. Johannes Fischer and Volker Heun. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, pages 459-470. Springer, 2007. Google Scholar
  22. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Symposium on Discrete Algorithms (SODA), pages 397-407. SIAM, 2017. Google Scholar
  23. Paweł Gawrychowski and Patrick K. Nicholson. Encodings of range maximum-sum segment queries and applications. In Proceedings of the Symposium on Combinatorial Pattern Matching (CPM), pages 196-206. Springer, 2015. Google Scholar
  24. Paweł Gawrychowski and Patrick K. Nicholson. Optimal encodings for range top-k, selection, and min-max. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP), pages 593-604. Springer, 2015. Google Scholar
  25. Paweł Gawrychowski and Przemysław Uznański. Order-preserving pattern matching with k mismatches. Theoretical Computer Science, 638:136-144, 2016. Google Scholar
  26. Mordecai Golin, John Iacono, Danny Krizanc, Rajeev Raman, Srinivasa Rao Satti, and Sunil Shende. Encoding 2D range maximum queries. Theoretical Computer Science, 609:316-327, 2016. Google Scholar
  27. Roberto Grossi, John Iacono, Gonzalo Navarro, Rajeev Raman, and Satti Srinivasa Rao. Encodings for range selection and top-k queries. In Proceedings of the European Symposium on Algorithms (ESA), pages 553-564. Springer, 2013. Google Scholar
  28. Tommi Hirvola and Jorma Tarhio. Approximate online matching of circular strings. In Proceedings of the Symposium on Experimental Algorithms (SEA), pages 315-325. Springer, 2014. Google Scholar
  29. Guy Jacobson. Space-efficient static trees and graphs. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE, 1989. Google Scholar
  30. Varunkumar Jayapaul, Seungbum Jo, Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Space efficient data structures for nearest larger neighbor. Journal of Discrete Algorithms, 36:63-75, 2016. Google Scholar
  31. Seungbum Jo, Rajeev Raman, and Srinivasa Rao Satti. Compact encodings and indexes for the nearest larger neighbor problem. In Proceedings of the International Workshop on Algorithms and Computation (WALCOM), pages 53-64. Springer, 2015. Google Scholar
  32. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theoretical Computer Science, 525:68-79, 2014. Google Scholar
  33. Marcin Kubica, Tomasz Kulczyński, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for consecutive permutation pattern matching. Information Processing Letters, 113(12):430-433, 2013. Google Scholar
  34. Gonzalo Navarro, Rajeev Raman, and Srinivasa Rao Satti. Asymptotically optimal encodings for range selection. In Proceedings of the 34th Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS), pages 291-301. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. Google Scholar
  35. Gonzalo Navarro and Sharma V Thankachan. Encodings for range majority queries. In Proceedings of the Symposium on Combinatorial Pattern Matching (CPM), volume 8486 of Lecture Notes in Computer Science, pages 262-272. Springer, 2014. Google Scholar
  36. Alessio Orlandi and Rossano Venturini. Space-efficient substring occurrence estimation. Algorithmica, 74(1):65-90, 2016. Google Scholar
  37. Mihai Patrascu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 166-175. IEEE, 2014. Google Scholar
  38. Rajeev Raman. Encoding data structures. In Proceedings of the International Workshop on Algorithms and Computation (WALCOM), pages 1-7. Springer, 2015. Google Scholar
  39. Rahul Shah. Personal communication, 2017. Google Scholar