,
Maxime Crochemore
Creative Commons Attribution 4.0 International license
We design alignment-free techniques for comparing a set of sequences or just a word, called a target, against another set of words, called a reference. This is done with the detection of factor patterns that distinguish the target from the reference. A target-specific factor of a target T against a reference R is then a factor w of a word in T that is not a factor of a word in R but whose proper factors of w are factors of a word in R. The strategy is based on the notion of minimal absent/forbidden words. We first address the computation of the set of target-specific factors of a target T against a reference R, where T and R are finite sets of sequences. The result is the construction of an automaton accepting the set of all considered target-specific factors. The construction algorithm runs in linear time according to the size of T ∪ R. The second result is the design of an algorithm to compute all the occurrences in a single sequence T of its target-specific factors against a reference R. The algorithm runs in real-time on the target sequence, independently of the number of occurrences of target-specific factors.
@InProceedings{beal_et_al:OASIcs.Grossi.14,
author = {B\'{e}al, Marie-Pierre and Crochemore, Maxime},
title = {{Specific Patterns Against Reference Sequences}},
booktitle = {From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
pages = {14:1--14:12},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-391-1},
ISSN = {2190-6807},
year = {2025},
volume = {132},
editor = {Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.14},
URN = {urn:nbn:de:0030-drops-238130},
doi = {10.4230/OASIcs.Grossi.14},
annote = {Keywords: Specific pattern, Minimal absent word, Minimal forbidden word, Directed Acyclic Word Graph (DAWG), Suffix automaton}
}