,
Tatiana Starikovskaya
Creative Commons Attribution 4.0 International license
In this work, we study the problem of detecting periodic trends in strings. While detecting exact periodicity has been studied extensively, real-world data is often noisy, where small deviations or mismatches occur between repetitions. This work focuses on a generalized approach to period detection that efficiently handles noise. Given a string S of length n, the task is to identify integers p such that the prefix and the suffix of S, each of length n-p+1, are similar under a given distance measure. Ergün et al. [APPROX-RANDOM 2017] were the first to study this problem in the streaming model under the Hamming distance. In this work, we combine, in a non-trivial way, the Hamming distance sketch of Clifford et al. [SODA 2019] and the structural description of the k-mismatch occurrences of a pattern in a text by Charalampopoulos et al. [FOCS 2020] to present a more efficient streaming algorithm for period detection under the Hamming distance. As a corollary, we derive a streaming algorithm for detecting periods of strings which may contain wildcards, a special symbol that match any character of the alphabet. Our algorithm is not only more efficient than that of Ergün et al. [TCS 2020], but it also operates without their assumption that the string must be free of wildcards in its final characters. Additionally, we introduce the first two-pass streaming algorithm for computing periods under the edit distance by leveraging and extending the Bhattacharya-Koucký’s grammar decomposition technique [STOC 2023].
@InProceedings{elghazi_et_al:LIPIcs.ISAAC.2025.36,
author = {El Ghazi, Taha and Starikovskaya, Tatiana},
title = {{Streaming Periodicity with Mismatches, Wildcards, and Edits}},
booktitle = {36th International Symposium on Algorithms and Computation (ISAAC 2025)},
pages = {36:1--36:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-408-6},
ISSN = {1868-8969},
year = {2025},
volume = {359},
editor = {Chen, Ho-Lin and Hon, Wing-Kai and Tsai, Meng-Tsung},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2025.36},
URN = {urn:nbn:de:0030-drops-249446},
doi = {10.4230/LIPIcs.ISAAC.2025.36},
annote = {Keywords: approximate periods, pattern matching, streaming algorithms}
}