,
Jannik Olbrich
Creative Commons Attribution 4.0 International license
Generalizations of plain strings have been proposed as a compact way to represent a collection of nearly identical sequences or to express uncertainty at specific text positions by enumerating all possibilities. While a plain string stores a character at each of its positions, generalizations consider a set of characters (indeterminate strings), a set of strings of equal length (generalized degenerate strings, or shortly GD strings), or a set of strings of arbitrary lengths (elastic-degenerate strings, or shortly ED strings). These generalizations are of importance to compactly represent such type of data, and find applications in bioinformatics for representing and maintaining a set of genetic sequences of the same taxonomy or a multiple sequence alignment. To be of use, attention has been drawn to answering various query types such as pattern matching or measuring similarity of ED strings by generalizing techniques known to plain strings. However, for some types of queries, it has been shown that a generalization of a polynomial-time solvable query on classic strings becomes NP-hard on ED strings, e.g. [Russo et al., 2022]. In that light, we wonder about other types of queries that are of particular interest to bioinformatics: unique substrings, absent words, anti-powers, longest previous factors, and Lempel-Ziv-like compression schemes. While we obtain a polynomial time algorithm for a variation of longest previous factors, we show that all other problems are NP-hard to compute, some of them even under the restriction that the input can be modeled as an indeterminate or GD string.
@InProceedings{koppl_et_al:LIPIcs.CPM.2026.14,
author = {K\"{o}ppl, Dominik and Olbrich, Jannik},
title = {{Hardness Results on Characteristics for Elastic-Degenerate Strings}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {14:1--14:25},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.14},
URN = {urn:nbn:de:0030-drops-259409},
doi = {10.4230/LIPIcs.CPM.2026.14},
annote = {Keywords: Elastic-degenerate strings, NP-hardness, longest common factor, minimal unique substring, minimal absent word, anti-power, longest previous factor}
}
archived version