Non-Overlapping Indexing - Cache Obliviously

Hooshmand, Sahar; Abedin, Paniz; Külekci, M. Oguzhan; Thankachan, Sharma V.

doi:10.4230/LIPIcs.CPM.2018.8

Abstract

The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size.

Cite As Get BibTex

Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, and Sharma V. Thankachan. Non-Overlapping Indexing - Cache Obliviously. In 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 105, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.CPM.2018.8

Author Details

Sahar Hooshmand

Dept. of Computer Science, University of Central Florida - Orlando, USA

Paniz Abedin

Dept. of Computer Science, University of Central Florida - Orlando, USA

M. Oguzhan Külekci

Informatics Institute, Istanbul Technical University - Turkey

Sharma V. Thankachan

Dept. of Computer Science, University of Central Florida - Orlando, USA

Funding

Part of this work was done while the last author was visiting the third author with the TÜBİTAK-BİDEB 2221 program grant number 1059B211700766. This work has also received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 69094 in the form of student travel grant to present a preliminary version of this work in the 12th Workshop on Compression, Text and Algorithms (WCTA), 2017.

References

Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116-1127, 1988. URL: http://dx.doi.org/10.1145/48529.48535.
Alberto Apostolico and Franco P Preparata. Data structures and algorithms for the string statistics problem. Algorithmica, 15(5):481-494, 1996.
Michael A. Bender, Erik D. Demaine, and Martin Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341-358, 2005. URL: http://dx.doi.org/10.1137/S0097539701389956.
Gerth Stølting Brodal and Rolf Fagerberg. Cache-oblivious string dictionaries. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 581-590. ACM Press, 2006. URL: http://dl.acm.org/citation.cfm?id=1109557.1109621.
Hagai Cohen and Ely Porat. Range non-overlapping indexing. In Yingfei Dong, Ding-Zhu Du, and Oscar H. Ibarra, editors, Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009. Proceedings, volume 5878 of Lecture Notes in Computer Science, pages 1044-1053. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-642-10631-6_105.
Maxime Crochemore. String-matching on ordered alphabets. Theoretical Computer Science, 92(1):33-47, 1992.
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 285-298. IEEE Computer Society, 1999. URL: http://dx.doi.org/10.1109/SFFCS.1999.814600.
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. ACM Trans. Algorithms, 8(1):4:1-4:22, 2012. URL: http://dx.doi.org/10.1145/2071379.2071383.
Arnab Ganguly, Rahul Shah, and Sharma V Thankachan. Succinct non-overlapping indexing. In Annual Symposium on Combinatorial Pattern Matching, pages 185-195. Springer, 2015.
Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338-355, 1984. URL: http://dx.doi.org/10.1137/0213024.
Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. Range non-overlapping indexing and successive list indexing. In Frank K. H. A. Dehne, Jörg-Rüdiger Sack, and Norbert Zeh, editors, Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings, volume 4619 of Lecture Notes in Computer Science, pages 625-636. Springer, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73951-7_54.
Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: http://dx.doi.org/10.1137/0222058.
Yakov Nekrich and Gonzalo Navarro. Sorted range reporting. In Fedor V. Fomin and Petteri Kaski, editors, Algorithm Theory - SWAT 2012 - 13th Scandinavian Symposium and Workshops, Helsinki, Finland, July 4-6, 2012. Proceedings, volume 7357 of Lecture Notes in Computer Science, pages 271-282. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-31155-0_24.
Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: http://dx.doi.org/10.1007/BF01206331.
Peter Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11. IEEE Computer Society, 1973. URL: http://dx.doi.org/10.1109/SWAT.1973.13.

Non-Overlapping Indexing - Cache Obliviously

Authors Sahar Hooshmand, Paniz Abedin, M. Oguzhan Külekci, Sharma V. Thankachan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message