Given a set of pattern strings 𝒫 = {P₁, P₂,… P_k} and a text string S, the classic dictionary matching problem is to report all occurrences of each pattern in S. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let m and n be the total length of the patterns 𝒫 and the length of the text string S, respectively, and let ̅m and ̅n be the total number of runs in the run-length encoding of the patterns in 𝒫 and S, respectively. Our main result is an algorithm that achieves O(( ̅m + ̅n)log log m + occ) expected time, and O( ̅m) space, where occ is the total number of occurrences of patterns in S. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an log log m factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.
@InProceedings{bille_et_al:LIPIcs.CPM.2025.21, author = {Bille, Philip and G{\o}rtz, Inge Li and Puglisi, Simon J. and Tarnow, Simon R.}, title = {{Compressed Dictionary Matching on Run-Length Encoded Strings}}, booktitle = {36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)}, pages = {21:1--21:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-369-0}, ISSN = {1868-8969}, year = {2025}, volume = {331}, editor = {Bonizzoni, Paola and M\"{a}kinen, Veli}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.21}, URN = {urn:nbn:de:0030-drops-231158}, doi = {10.4230/LIPIcs.CPM.2025.21}, annote = {Keywords: Dictionary matching, run-length encoding, compressed pattern matching} }
Feedback for Dagstuhl Publishing