,
Che-Wei Tsao
,
Wing-Kai Hon
,
Dominik Köppl
Creative Commons Attribution 4.0 International license
A string S is called a square if it can be written as the concatenation of two identical strings. Two strings P and Q of the same length are said to square match if, for every substring of P, it is a square if and only if the corresponding substring of Q is also a square. The square pattern matching problem asks for locating all substrings of a given text T of length n that square match a query pattern P of length m. This notion captures similarity in repetition structures and is motivated by applications in areas such as bioinformatics and music structure analysis. In this paper, we introduce a novel technique, called the longest prefix square (LPS) encoding, which represents the square structure of a string as an integer array of the same length. We show that two strings square match if and only if they have identical LPS encodings. Based on this result, we construct an index solving the square pattern matching problem in time O(m lg m + occ) using O(nlg²n) bits of space, where occ denotes the number of occurrences of substrings in T that square match P. If the LPS encoding of P is precomputed, the query time improves to O(m + occ).
@InProceedings{chen_et_al:LIPIcs.CPM.2026.35,
author = {Chen, Po-Chun and Tsao, Che-Wei and Hon, Wing-Kai and K\"{o}ppl, Dominik},
title = {{Efficient Index for Square Pattern Matching}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {35:1--35:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.35},
URN = {urn:nbn:de:0030-drops-259617},
doi = {10.4230/LIPIcs.CPM.2026.35},
annote = {Keywords: string algorithms, pattern matching, indexing, squares}
}