An Encoding for Order-Preserving Matching

Gagie, Travis; Manzini, Giovanni; Venturini, Rossano

doi:10.4230/LIPIcs.ESA.2017.38

Abstract

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets.  In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: e.g., (4, 1, 3, 2) and (10, 3, 7, 5)  are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size sigma and a constant c >=1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m >= log^c n, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if  log(sigma) = Omega(log log n); our query time is optimal if  log(sigma) = Omega(log n). Our space bound contrasts with the Omega(n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length.  Moreover, we can build our encoding knowing only how each character compares to O(log^c n) neighbouring characters.

Brenda S. Baker. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences, 52(1):28-42, 1996.
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In Proceedings of the Symposium on Discrete Algorithms (SODA), pages 785-794. SIAM, 2009.
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Theory and practise of monotone minimal perfect hashing. In Proceedings of the Meeting on Algorithm Engineering and Experiments (ALENEX), pages 132-144. SIAM, 2009.
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Fast prefix search in little space, with applications. In Proceedings of the European Symposium on Algorithms (ESA), pages 427-438. Springer, 2010.
Djamal Belazzougui, Adeline Pierrot, Mathieu Raffinot, and Stéphane Vialette. Single and multiple consecutive permutation motif search. In Proceedings of the International Symposium on Algorithms and Computation (ISAAC), pages 66-77. Springer, 2013.
Domenico Cantone, Simone Faro, and M. Oǧuzhan Külekci. An efficient skip-search approach to the order-preserving pattern matching problem. In Proceedings of the Prague Stringology Conference (PSC), pages 22-35. Department of Theoretical Computer Science, Czech Technical University in Prague, 2015.
Tamanna Chhabra, Simone Faro, M. Oğuzhan Külekci, and Jorma Tarhio. Engineering order-preserving pattern matching with SIMD parallelism. Software: Practice and Experience, 2016.
Tamanna Chhabra, Emanuele Giaquinta, and Jorma Tarhio. Filtration algorithms for approximate order-preserving matching. In Proceedings of the Symposium on String Processing and Information Retrieval (SPIRE), pages 177-187. Springer, 2015.
Tamanna Chhabra, M. Oǧuzhan Külekci, and Jorma Tarhio. Alternative algorithms for order-preserving matching. In Proceedings of the Prague Stringology Conference (PSC), pages 36-46. Department of Theoretical Computer Science, Czech Technical University in Prague, 2015.
Tamanna Chhabra and Jorma Tarhio. Order-preserving matching with filtration. In Proceedings of the Symposium on Experimental Algorithms (SEA), pages 307-314. Springer, 2014.
Tamanna Chhabra and Jorma Tarhio. A filtration method for order-preserving matching. Information Processing Letters, 116(2):71-74, 2016.
Sukhyeun Cho, Joong Chae Na, Kunsoo Park, and Jeong Seop Sim. A fast algorithm for order-preserving pattern matching. Information Processing Letters, 115(2):397-402, 2015.
Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Order-preserving indexing. Theoretical Computer Science, 638:122-135, 2016.
Pooya Davoodi, Gonzalo Navarro, Rajeev Raman, and S. Srinivasa Rao. Encoding range minima and range top-2 queries. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 372(2016):20130131, 2014.
Gianni Decaroli, Travis Gagie, and Giovanni Manzini. A compact index for order-preserving pattern matching. In Proceedings of the Data Compression Conference (DCC), 2017.
Peter Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM, 21(2):246-260, 1974.
Robert M. Fano. On the number of bits required to implement an associative memory. Technical Report Memorandum 61, Project MAC, Computer Structures Group, Massachusetts Institute of Technology, 1971.
Simone Faro and M. Oğuzhan Külekci. Efficient algorithms for the order preserving pattern matching problem. In Proceedings of the Conference on Algorithmic Applications in Management (AAIM), pages 185-196. Springer, 2016.
Paolo Ferragina and Giovanni Manzini. An experimental study of a compressed index. Information Sciences, 135(1):13-28, 2001.
Johannes Fischer. Combined data structure for previous-and next-smaller-values. Theoretical Computer Science, 412(22):2451-2456, 2011.
Johannes Fischer and Volker Heun. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, pages 459-470. Springer, 2007.
Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Symposium on Discrete Algorithms (SODA), pages 397-407. SIAM, 2017.
Paweł Gawrychowski and Patrick K. Nicholson. Encodings of range maximum-sum segment queries and applications. In Proceedings of the Symposium on Combinatorial Pattern Matching (CPM), pages 196-206. Springer, 2015.
Paweł Gawrychowski and Patrick K. Nicholson. Optimal encodings for range top-k, selection, and min-max. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP), pages 593-604. Springer, 2015.
Paweł Gawrychowski and Przemysław Uznański. Order-preserving pattern matching with k mismatches. Theoretical Computer Science, 638:136-144, 2016.
Mordecai Golin, John Iacono, Danny Krizanc, Rajeev Raman, Srinivasa Rao Satti, and Sunil Shende. Encoding 2D range maximum queries. Theoretical Computer Science, 609:316-327, 2016.
Roberto Grossi, John Iacono, Gonzalo Navarro, Rajeev Raman, and Satti Srinivasa Rao. Encodings for range selection and top-k queries. In Proceedings of the European Symposium on Algorithms (ESA), pages 553-564. Springer, 2013.
Tommi Hirvola and Jorma Tarhio. Approximate online matching of circular strings. In Proceedings of the Symposium on Experimental Algorithms (SEA), pages 315-325. Springer, 2014.
Guy Jacobson. Space-efficient static trees and graphs. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE, 1989.
Varunkumar Jayapaul, Seungbum Jo, Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Space efficient data structures for nearest larger neighbor. Journal of Discrete Algorithms, 36:63-75, 2016.
Seungbum Jo, Rajeev Raman, and Srinivasa Rao Satti. Compact encodings and indexes for the nearest larger neighbor problem. In Proceedings of the International Workshop on Algorithms and Computation (WALCOM), pages 53-64. Springer, 2015.
Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theoretical Computer Science, 525:68-79, 2014.
Marcin Kubica, Tomasz Kulczyński, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for consecutive permutation pattern matching. Information Processing Letters, 113(12):430-433, 2013.
Gonzalo Navarro, Rajeev Raman, and Srinivasa Rao Satti. Asymptotically optimal encodings for range selection. In Proceedings of the 34th Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS), pages 291-301. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014.
Gonzalo Navarro and Sharma V Thankachan. Encodings for range majority queries. In Proceedings of the Symposium on Combinatorial Pattern Matching (CPM), volume 8486 of Lecture Notes in Computer Science, pages 262-272. Springer, 2014.
Alessio Orlandi and Rossano Venturini. Space-efficient substring occurrence estimation. Algorithmica, 74(1):65-90, 2016.
Mihai Patrascu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In Proceedings of the Symposium on Foundations of Computer Science (FOCS), pages 166-175. IEEE, 2014.
Rajeev Raman. Encoding data structures. In Proceedings of the International Workshop on Algorithms and Computation (WALCOM), pages 1-7. Springer, 2015.
Rahul Shah. Personal communication, 2017.

An Encoding for Order-Preserving Matching

Authors Travis Gagie, Giovanni Manzini, Rossano Venturini

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message