Consensus Strings with Small Maximum Distance and Small Distance Sum

The parameterised complexity of various consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds.


Introduction
Consensus string problems have the following general form: given input strings S = {s 1 , . . . , s k } and a distance bound d, find a string s with distance at most d from the input strings. With the Hamming distance as the central distance measure for strings, there are two obvious types of distance between a single string and a set S of strings: the maximum distance between s and any string from S (called radius) and the sum of all distances between s and strings from S (called distance sum). The most basic consensus string problem is Closest String, where we get a set S of k length- strings and a bound d, and ask whether there exists a lengthsolution string s with radius at most d. This problem is NP-complete (see [19]), but fixed-parameter tractable for many variants (see [20]), including the parameterisation by d, which in biological applications can often be assumed to be small (see [14,21]). A classical extension is Closest Substring, where the strings of S have length at most , the solution string must have a given length m and the radius bound d is with respect to some length-m substrings of the input strings. A parameterised complexity analysis (see [15,16,25]) has shown Closest Substring to be harder than Closest String. If we bound the distance sum instead of the radius, then Closest String collapses to a trivial problem, while Closest Substring, which is then called Consensus Patterns, remains NP-complete (see [23]). Closest String with Outliers is a recent extension, which is defined like Closest String, but with the possibility to ignore a given number of t input strings (see [6]).
The main motivation for consensus string problems comes from the important task of finding similar regions in DNA or other protein sequences, which arises in many different contexts of computational biology, e. g., universal PCR primer design [11,21,24,28], genetic probe design [21], antisense drug design [10,21], finding transcription factor binding sites in genomic data [30], determining an unbiased consensus of a protein family [3], and motif-recognition [21,26,27]. The consensus string problems are a formalisation of these computational tasks and most variants of them are NP-hard. However, due to their high practical relevance, it is necessary to solve them despite their intractability, which has motivated the study of their approximability, on the one hand, but also their fixed-parameter tractability, on the other (see the survey [7] for an overview of the parameterised complexity of consensus string problems). This work is a contribution to the latter branch of research. In the following, we motivate in more detail the research carried out in this paper.
From a theoretical point of view, these consensus string problems (as is usually the case for string problems) have a large number of quite natural and obvious numerical parameters, e. g., number of input strings, their lengths, alphabet size, the distance bounds and so on. Therefore, from a parameterised complexity point of view, they have a somewhat different nature than the typical graph problems, for which we have the obvious standard parameterisations (usually some size bound that is part of the input, e. g., the clique-size for Clique) or more complex structural parameters (like width-parameters as treewidth and so on), while obvious numerical parameters, e. g., number of vertices or edges, are usually not interesting (with the degree of a graph being an exception). Consequently, for string problems, the challenge is to discover among the rather large number of different combinations of these obvious parameters those that yield fixed-parameter tractability; thus, obtaining a complete "map of fixedparameter tractability" of the problem.
From a more practical point of view, we note that for string problems, which are usually motivated by tasks from computational biology, it is often the case that it is known which parameters can be considered to be small in practical scenarios and which do not have this desirable property. This leads to parameters (or parameter combinations) that are more important than others. Consequently, the most pressing question is whether we can achieve fixed-parameter tractability for these "small" parameters. Furthermore, the knowledge of which parameters are important may guide an algorithmic engineering process, e. g., if we have achieved fixed-parameter tractability with respect to an important parameter, but the problem formalisation does not quite cover the practical scenario, we can search for modifications of the problem that maintain the fixed-parameter tractability and are still suitable for practical scenarios with small parameter values. For example, as explained in [6], in practical applications of consensus string problems it cannot always be avoided that the set of input strings includes a small number of strings that are quite different from all the others. In order to still get a solution, we would have to drastically increase the radius bound, which also leads to a solution that is undesirable from a practical point of view. Instead, it makes much more sense to directly cater for this presence of "outliers" by modifying the problem formulation accordingly. This is the motivation for the outlier-variant introduced in [6] (note that Closest String with Outliers parameterised by the radius bound and number of outliers is fixed-parameter tractable [6]). In particular, if we find a suitable solution string when some outliers are excluded, then it seems natural that the the initial decision of including these strings needs to be revised (as pointed out in [6], this is another motivation for the outlier-variant).
In this work, we propose a different modification, which leads to a generalisation of all the consensus string problems mentioned above: we consider the case where we have a radius bound and a distance sum bound at the same time. From a theoretical point of view, this leads to the question which of the fixed-parameter tractable cases of the variants with only one bound are still fixed-parameter tractable if we consider both bounds. However, we believe this problem can also be relevant from a practical point of view, since having both a radius bound and a distance sum bound allows for a finer tuning of the solutions (similar as the addition of outliers). We shall motivate this by an example.
Assume that by solving the outlier-variant for a set of strings, we have found out that our desired radius bound can only be met by declaring strings as outliers that should not be outliers (i. e., strings for which we know for certain that they should be included in the input set), or that a solution string cannot be found for a number of outliers that is small enough that the algorithm's running time is still acceptable. In this case, slightly increasing the radius bound seems inevitable, but it is still reasonable to require that this larger distance to the solution string should be used to its full capacity only by a small number of input strings. This requirement could be formulated by adding a distance sum bound that is significantly smaller than the number of input strings multiplied by the radius bound. It is also reasonable to think about allowing both, a small number of outliers that handles strings that should not be in the input set at all and a distance sum bound that takes care of bounding the number of "high distance" input strings.
Next, we define more formally the consensus string problems considered in this paper and then explain in full detail the respective known results in the literature and our new contributions.

Problem Definition
Let Σ be a finite alphabet, Σ * be the set of all strings over Σ, including the empty string ε and Σ + = Σ * \{ε}. For w ∈ Σ * , |w| is the length of w and, for every i, 1 ≤ i ≤ |w|, by w[i], we refer to the symbol at position i of w. For every n ∈ N ∪ {0}, let Σ n = {w ∈ Σ * | |w| = n} and Σ ≤n = n i=0 Σ i . By , we denote the substring relation over the set of strings, i. e., for u, v ∈ Σ * , u v if v = xuy, for some x, y ∈ Σ * . We use the concatenation of sets of strings as usually defined, i. e., for 1 Next, we state the consensus string problems to be investigated. The most basic one is (r, s)-Closest String [denoted by (r, s)-CloseStr in the following]: (r, s)-CloseStr and integers d r , d s ∈ N.
Question: Is there an s ∈ Σ with r H (s, S) ≤ d r and s H (s, S) ≤ d s ?
If we allow a given number of the input strings to be excluded and require the bounds d r and d s to be satisfied with respect to the remaining strings, we obtain the problem (r, s)-Closest String with Outliers [this will also be called the outlier-variant (of (r, s)-CloseStr) and will be denoted by (r, s)-CloseStrwo]: (r, s)-CloseStrwo Question: Is there an s ∈ Σ and S ⊆ S with |S | = k − t such that r H (s, S ) ≤ d r and s H (s, S ) ≤ d s ?
For the problem (r, s)-Closest Substring [which will also be called the substringvariant (of (r, s)-CloseStr] and is denoted by (r, s)-CloseSubstr), the input words can have different lengths and we are asking for a string that satisfies the bounds d r and d s with respect to some substrings of the input strings (that all have the same given length): (r, s)-CloseSubstr  Illustrations of instances and solution strings for different variants of consensus string problems (mismatches are highlighted by gray circles): a shows a CloseStr instance and a solution string with radius 3 and distance sum 13, b shows the same instance, but with a solution string with radius 2 and distance sum 16, c shows a CloseSubstr instance (with m = 4) and a solution string with radius 1 and distance sum 7 (the corresponding substrings are highlighted by gray rectangles), and d shows a CloseStr-wo instance (with t = 2) and a solution string with radius 2 and distance sum 7 (s 3 and s 7 are declared outliers) See Fig. 1 for an illustration of these problems. We next introduce some convenient terminology.
By the terms (r)-CloseStr and (s)-CloseStr, we denote the variants of (r, s)-CloseStr, where the only distance bound is d r or d s , respectively; we shall also call them the (r)and (s)-variant of CloseStr, the radius and distance sum variant of CloseStr, or simply the single-bound variants if we refer to either of them; the problem (r, s)-CloseStr will sometimes be referred to as the general variant. We also use the term CloseStr [i.e., without the prefixes (r, s)-, (r)or (s)-] whenever we generally refer to all (or any) of these different variants. Analogous terminology applies to the outlier-variant and the substring-variant.

Parameterised Complexity Theory
We assume the reader to be familiar with the basic concepts of (classical) complexity theory. Next, we shall briefly summarise the fundamentals of parameterised complexity (see also [9,13,17], and, especially for information on kernelisation, the recent textbook [18]).
A parameterised problem is a decision problem with instances (x, k), where x is the actual input and k ∈ N is the parameter. By FPT, we denote the class of fixedparameter tractable problems, i. e., problems having an algorithm with running-time O( f (k)·g(n)), for a computable function f and polynomial g, where n is the size of the instance and k is the parameter. In order to argue about fixed-parameter intractability, we need the following kind of reductions. A (classical) many-one reduction R from a parameterised problem to another is an fpt-reduction, if the parameter of the target problem is bounded in terms of the parameter of the source problem, i. e., there is a computable function h : Parameterised problems that are hard (with respect to fpt-reductions) for the class W [1] are not in FPT (under some complexity-theoretical assumptions, see [9,13,17] for further details). If a parameterised problem is NP-hard when the parameter is fixed to a constant, then it is not in FPT, unless NP = P (thus; providing even stronger evidence for fixed-parameter intractability than W[1]-hardness).
A kernelisation for a parameterised problem P is an algorithm that transforms an instance (x, k) of P into a reduced instance (x , k ) of P in time polynomial in |x|+|k| such that -|x | + k ≤ g(k) for some computable function g, -(x, k) is a positive instance if and only if (x , k ) is a positive instance.
For the sake of convenience, we also say that a parameterised problem has a kernel in order to denote that there is a kernelisation as defined above. If the kernlisation is such that the function g is a polynomial, then we say that the problem has a polynomial kernel. It is a well-known fact that a parameterised problem is fixed-parameter tractable if and only if it has a kernel. On the other hand, many fixed-parameter tractable problems do not seem to have a polynomial kernel.
Note that all these concepts from parameterised complexity theory naturally extend to problems that are parameterised by several parameters at the same time.
The natural parameters that arise in the context of the consensus string problems defined above are the following (we shall consistently use these parameter names throughout the remainder of the paper): For some parameters p 1 , p 2 , . . . , p q , by (r, s)-CloseStr( p 1 , p 2 , . . . , p q ) we denote the problem (r, s)-CloseStr parameterised by the parameters p 1 , p 2 , . . . , p q , e. g., (r, s)-CloseStr(|Σ|, ) is the problem (r, s)-CloseStr parameterised by the alphabet size and the length of the input strings. Note that this problem is trivially in FPT, since enumerating all strings in Σ and checking for each whether it is a solution string is an fpt-algorithm. Moreover, this variant does not seem to have a polynomial kernel (in fact, it can be shown that, under some complexity theoretical assumption, it does not have a polynomial kernel; see Sect. 5), while (r, s)-CloseStr(k, ) trivially has a polynomial kernel (the original input is of size × k and therefore a polynomial kernel).
We use analogous terminology for the substring and outlier-variants and also for the single-bound variants, e. g., (r)-CloseStrwo(d r , t). Note that we consider parameters t and k−t only for the outlier-variants, parameter m only for the substring-variants, and parameters d r and d s only if they exist for the problem, e. g., d r can only be a parameter for the general variants or the (r)-variants, but not for (s)-variants.

Known Results
Some of the single-bound variants of the consensus string problems have already been considered in the literature, but under different names. More precisely, the names Closest String and Closest Substring are common in the literature in order to denote the radius variants of CloseStr and CloseSubstr, while the common term Consensus Patterns usually refers to what we have defined as the distance sum variant of CloseSubstr (see, e. g., [15,16,19,20,25]); the term Closest String with Outliers is used in [6] (where the outlier-variant is also introduced for the first time) in order to denote the radius variant of CloseStr-wo.
All the consensus string problems are NP-hard, except the distance sum variant of CloseStr, which is trivial problem (choosing for every column a symbol with majority always yields an optimal solution string). The parameterised complexity (with respect to the above-mentioned parameters) of the radius variants of CloseStr are completely settled (see [19,20]): parameterising by any of the single parameters k, d r and yields fixed-parameter tractability, while the problem remains NP-hard if |Σ| = 2. To the knowledge of the authors, the complexity of the variant (r, s)-CloseStr with both bounds has not yet been investigated in the literature (an exception is [1], where optimising both the radius and the distance sum has been considered for the special case k = 3). Another point of view consists in seeing (r)-CloseStr and (s)-CloseStr as two special cases of a more general problem, which aims at minimizing the L p norm of the vector of distances between the solution string and the input strings (with respectively p = ∞ and p = 1). Thus these two extreme optimization goals may be combined by taking intermediate values of p: for binary strings, the problem is NP-hard for each 1 < p < ∞, and fixed-parameter tractable when parameterised by k or by (d, p) [8].
With respect to the substring-variants, all parameterisations of the radius variant have been settled, while for the distance sum variant all parameterisations but the single parameter [or (m, ), which, since we can assume m ≤ , is the same] have been settled (see [15,16,25]). These results show that, at least for the single-bound variants, CloseSubstr is a much harder problem than CloseStr. More precisely, fixed-parameter tractability of (r)-CloseSubstr can only be achieved if parameterised by (see [15]) or (m, |Σ|) (which is trivial), while all other parameterisations are W[1]hard. With respect to (s)-CloseSubstr, the only known fixed-parameter tractable cases are with respect to (d s , |Σ|) (see [25]) and (m, |Σ|) (which is again trivial), and the case of parameter is open. However, it has been shown in [29] that if we consider the difference between the length of the input strings and the length of the solution string, i. e., ( −m), as a parameter, then adding any of the additional parameters k, d r , [which also make (r)-CloseStr fixed-parameter tractable] yields fixed-parameter tractability for (r)-CloseSubstr. For the distance sum variant, the special parameter ( − m) only helps if additionally k or d s is also a parameter, while the case (( − m) = 4, |Σ| = 4) is even NP-hard and the parameterisation (( − m), m) is the same as , which again leads to the open case mentioned above (see [29] for details). As for CloseStr, the complexity of (r, s)-CloseSubstr has not yet been investigated in the literature.
A parameterised complexity analysis of the radius variant of CloseStr-wo has been started more recently in [6], where it is shown that the problem is fixed-parameter tractable with respect to single parameter d r and the parameters (|Σ|, k), while it is The (s)-variant or the general variant with both bounds has not yet been considered in the literature.
Questions of kernelisations for consensus string problems have been recently investigated in [2].

Our Contribution
The main contribution of this paper is to initiate the parameterised complexity analysis of the general variants (i. e., with both the radius and the distance bound) of the consensus string problems. In this regards, we are able to completely settle (i. e., proving either fixed-parameter tractability or W[1]-hardness for all parameterisations with respect to the parameters defined in Sect. 1.2) the problems (r, s)-CloseStr and (r, s)-CloseSubstr (and their single-bound variants). Obviously, as indicated by the discussions of Sect. 1.3, a large part of this complete picture is already provided in the existing literature, namely almost all the single-bound variants. Moreover, some of the results for the general variants can be concluded with moderate effort from results on the single-bound variants. What required more effort was to close the gap that was left in the literature with respect to (s)-CloseSubstr (see Sect. 1.3) and to carry over the fixed-parameter tractability from (r)-CloseStr(k) to (r, s)-CloseStr(k).
With respect to the outlier-variant, we are able to settle some more open problems from the literature, but the fixed-parameter tractability of many parameterisations remains unsettled. Our main positive algorithmic result is that (r, s)-CloseStrwo (d r , t) [and therefore (r, s)-CloseStr(d r )] is fixed-parameter tractable, which is achieved by a non-trivial extension of a branching algorithm from [20] for (r)-CloseStr(d r ). While the general branching strategy is analogous to the one of [20], taking care of the distance sum bound and of the outliers requires some new ideas and leads to a more involved algorithm with a more complicated proof of correctness. While this is interesting from a theoretical point of view, it is particularly interesting in the light of the discussions at the beginning of Sect. 1 about the practical relevance of parameters d r and t. In addition to several other simpler fixed-parameter tractability results, we show, as the main negative result with respect to the outlier-variant, that ; to the knowledge of the authors, this constitutes also the first proof of NP-hardness of (s)-CloseStrwo. In particular, this shows that unlike CloseStr, for which the radius variant is hard and the distance sum variant is trivial, the outlier-variant resembles the substringvariant where both single-bound variants are hard (note that the general hardness of the (s)-variant of CloseStr-wo was not known).
Finally, we investigate the question whether the fixed-parameter tractable variants of the considered consensus string problems allow polynomial kernels; thus, continuing a line of work initiated by Basavaraju et al. [2], in which kernelisation lower bounds for (r)-CloseStr and (r)-CloseSubstr are proved. Some results from [2] about the single-bound variants directly carry over to the general variants; our main contribution is a cross-composition from (r)-CloseStr into (r)-CloseStrwo, which rules out a polynomial kernel for (r, s)-CloseStrwo(d r , d s , , (k − t), |Σ|).

Organisation of the Paper
In Sect. 2, we settle all parameterisations of the problem (r, s)-CloseStr. Then, in Sect. 3, we consider the general as well as the single-bound variants of the outliervariant; this section also contains our main result, i. e., the branching algorithm for (r, s)-CloseStrwo(d r , t). The substring-variant will then be investigated in Sect. 4 and questions about kernelisations will be discussed in Sect. 5. Finally, in Sect. 6, we summarise and discuss our results and mention the most interesting open problems.

CLOSEST STRING with Radius and Distance Sum Bound
We shall first give some useful definitions. It will be convenient to treat a set S = {s i | 1 ≤ i ≤ k} ⊆ Σ as a k × matrix with entries from Σ. By the term column of S, we refer to the transpose of a column of the matrix S, which is an element from Σ k ; thus, the introduced string notations apply, e. g., if c is the ith column of S, and only if s is a majority string for S. We call a string s ∈ Σ radius optimal or distance sum optimal (with respect to a set It is a well-known fact that (r)-CloseStr allows fpt-algorithms for any of the single parameters k, d r or , and it is still NP-hard for |Σ| = 2 (see [20]). While the latter hardness result trivially carries over to (r, s)-CloseStr (by setting d s = k · d r ), we have to modify the fpt-algorithms for extending the fixed-parameter tractability results to (r, s)-CloseStr.
We start with parameter k, for which we can extend the ILP-approach that is used in [20] to show (r)-CloseStr(k) ∈ FPT. Before we can formally do this, we need a few more definitions.
We say that S ⊆ Σ is normalised, if Σ = {a 1 , a 2 , . . . , a k }, every column of S contains the symbols {a 1 , a 2 , . . . , a p }, for some p, 1 ≤ p ≤ k, and the first occurrence of a i , 1 ≤ i ≤ p − 1, occurs before the first occurrence of a i+1 . If S is normalised, any two isomorphic columns are equal (i. e., if two columns are not identical, then it is not possible to obtain one from the other by bijective renaming of the symbols). It can be easily seen that any (r, s)-CloseStr instance can be transformed into an equivalent one with normalised S (see [20]). For both normalised and non-normalised S, we use the term column-types to denote the different forms of columns, rather than the collection of all columns, i. e., the set of column types of S is {c ∈ Σ k | c occurs as a column in S}.
Proof We extend the ILP-approach that has been used in [20] is the Bell number). We extend the ILP from [20] that has a solution if and only if the (r, s)-CloseStr instance has a solution. For every column type t and every a ∈ Σ, the variable x t,a stands for the number |{ j | 1 ≤ j ≤ , t j has type t, s[ j] = a}|, where s is the hypothetical solution string. Intuitively speaking, the number x t,a says how often a column of type t is paired with an occurrence of the symbol a in the solution string. The inequalities of the ILP are as follows: Since we can assume |Σ| ≤ k, we have k · B(k) variables and the result follows from the fact that ILP parameterised by the number of variables is in FPT (see [22]). Next, we consider the parameter d r . For the (r)-variant of CloseStr, the fixedparameter tractability with respect to d r is shown in [20] by a branching algorithm, which proved itself as rather versatile: it has successfully been extended in [6] to (r)-CloseStrwo(d r , t) and in [29] to (r)-CloseSubstr(d r , ( − m)). We shall next briefly sketch this algorithm from [20].
Let S = {s 1 , s 2 , . . . , s k } ⊆ Σ , d r ∈ N be an (r)-CloseStr instance and assume that there is a solution string s. If s = s 1 is not a solution string, i. e., r H (s , S) ≥ d r +1, which is a contradiction). Obviously, in order to transform s to the solution string, this position j of s must be changed to the one of s i . Consequently, arbitrarily choosing a set } of cardinality d r + 1, branching over all these d r + 1 positions and changing them in s to the corresponding positions in s i yields a branching algorithm for (r)-CloseStr (note that a branching depth of d r is sufficient, since it must be possible to reach the solution string by changing at most d r positions of s 1 ).
We propose an extension of the same branching algorithm, that allows for a bound d s on the distance sum; thus, it works for (r, s)-CloseStr(d r ). In fact, we prove in Theorem 5 an even stronger result, where we also extend the algorithm to exclude up to t outlier strings from the input set S, i. e., we extend it to the problem (r, s)-CloseStrwo(d r , t). Since Theorem 2 can therefore be seen as a corollary of this result by taking t = 0, we only give an informal description of a direct approach that solves (r, s)-CloseStr(d r ) (and refer to Theorem 5 for a formal proof of correctness).
The main problem in extending the algorithm to the case of an additional bound d s on the distance sum can be described as follows. If we start with some input string as the first candidate string and then carry out the branching as sketched above, then we have no guarantee that the resulting solution satisfies the distance sum bound d s . On the other hand, if we start with some other candidate string that is somehow tailored to the distance sum bound, we lose the guarantee that a solution can be reached by a number of changes that only depends on d r (which is trivially the case if we start with an input strings).
An obvious choice for a first candidate string for a branching algorithm that also takes the distance sum bound into consideration is a majority string (see Fig. 2), since this is the "best" string with respect to the distance sum bound. Starting with this string, we can apply the same branching strategy in order to change it step by step into a string that satisfies the radius bound. However, this can only result in a valid fpt-algorithm (with respect to parameter d r ), if the branching depth can be bounded by a function in d r , which is done by the following lemma [that we also need later for the proof of correctness of the algorithm for (r, s)-CloseStrwo(d r , t)].

and let s m be a majority string for S. Then
A branching algorithm for (r, s)-CloseStr(d r ) can now be sketched as follows. We start with a majority string s m and apply the branching as described above. The The same matrix of strings, its refined majority string and the disputed columns highlighted in grey branching depth is bounded by 2d r (due to Lemma 1) and we cut any branch where the distance sum goes beyond the threshold d s . If there exists a solution that satisfies the d r bound, then there must be a path in the branching tree in which all changes of single positions are necessary, and, since we started with a majority string, all unchanged positions have a symbol that causes the fewest additional mismatches (for a formal proof of correctness, we refer to Theorem 5).
It only remains to take a look at the parameters and d s , for which containment in FPT follows easily from known results. More precisely, we can assume d r ≤ and we can further assume that every column of S contains at least two different symbols (all columns without this property could be removed), which implies s H (s i , S) ≥ for every s ∈ Σ ; thus, we can assume ≤ d s . Consequently, we obtain the following corollary: This completely settles the parameterised complexity of (r, s)-CloseStr with respect to parameters k, d r , d s , |Σ| and (see Table 1 for an overview of the results). Recall that the (r)-variant is already settled, while the (s)-variant is trivial.

The Outlier-Variant
In this section, we investigate (r, s)-CloseStrwo and their (r)and (s)-variants. We first prove several fixed-parameter tractability results for the general variant and we consider the (r)and (s)-variants later on.
First, we note that solving an instance of (r, s)-CloseStrwo(k) can be reduced to solving k t ≤ 2 k many (r, s)-CloseStr(k) instances, which, due to the fixedparameter tractability of the latter problem, yields the fixed-parameter tractability of the former. We next show that if the number k − t of inliers exceeds d s , then an (r, s)-CloseStrwo instance becomes easily solvable; thus, k − t can be bounded by d s . If in addition t is also a parameter, this implies that k is bounded, so fixed-parameter tractability follows from Theorem 3.

Theorem 4 (r, s)-CloseStrwo(d s , t) ∈ FPT.
Proof Let S = {s i | 1 ≤ i ≤ k} ⊆ Σ and d r , d s , t ∈ N be an (r, s)-CloseStrwo instance. If d s < k − t, then every solution string s must satisfy s = s i , for some i, 1 ≤ i ≤ k. Moreover, if s i is a solution string, then s i is a solution string with respect to the set S ⊆ S containing the k − t strings with the least Hamming-distance from s i . Consequently, we can compute a solution in polynomial time. If, on the other hand, k − t ≤ d s , then k − t and t can be considered parameters; thus, k is a parameter and the result follows from (r, s)-CloseStrwo(k) ∈ FPT (see Theorem 3).
We now turn to the parameter d r . As briefly mentioned in Sect. 2, the algorithm introduced in [20] to prove (r)-CloseStr(d r ) ∈ FPT has been extended in [6] with an additional branching that guesses whether a string s j should be considered an outlier or not; thus, yielding fixed-parameter tractability of (r)-CloseStrwo(d r , t). Moreover, we already sketched how the algorithm from [20] could be extended to (r, s)-CloseStr(d r ). Next, we combine these two approaches in a non-trivial way in order to obtain an fpt-algorithm for (r, s)-CloseStrwo(d r , t) (as explained in Sect. 2, this also provides a formal proof of Theorem 2).
The main problem about the general approach sketched in Sect. 2, i. e., starting with the majority string as a first candidate, is that whether a certain symbol is a majority symbol in a column depends on the choice of outliers. For example, both a and d are majority symbols of the first column of the matrix of Fig. 2a, but if t = 2 and s 1 and s 2 are declared as outliers, then, in case that the first symbol is not changed by the branching modifications, it is possible that d was a bad choice, since it causes more mismatches compared to a (with respect to the matrix from which the outliers are removed). In order to deal with this issue, we refine the concept of a majority string and tailor it to the outlier-variant.
Let (S, d s , d r , t) be an instance of (r, s)-CloseStrwo(d r , t) and let (S * , s * ) be a solution for this instance. We say that a character x is frequent in column i if it has at least as many occurrences as a majority character minus t (thus, for any S ⊆ S, , it is the majority character of its column (except for disputed columns in which we use an "undecided" character ). In particular, note that the refined majority string is by definition a lower bound. A completion for S ⊆ S of a string s ∈ (Σ ∪ { }) * is the string obtained by replacing each occurrence of by a majority character of the corresponding column in S (for example, in Fig. 2(b), a possible completion for {s 3 , s 4 , s 5 , s 6 } of the refined majority string s m would be aacbccd).
The following lemma states that the number of disputed columns of S can be bounded in terms of d r , which shall be a central building block of the following branching algorithm. (S, d s , d r , t) be a positive instance of (r, s)-CloseStrwo(d r , t) with D disputed columns. If k ≥ 5t, then D ≤ 4d r .

Lemma 2 Let
Proof Let (S * , s * ) be a solution for the instance (S, d s , d r , t). In a disputed column i, no character occurs more than k+t 2 times, hence, among the k − t strings of S * , there are at least (k − t) − k+t 2 = k−3t 2 mismatches at position i. The disputed columns thus introduce at least D k−3t 2 mismatches. Since the overall number of mismatches is upper-bounded by d r (k − t), we have D ≤ 2d r (k−t) k−3t = 2d r 1 + 2t k−3t , and, with k ≥ 5t, the upper-bound D ≤ 4d r follows.
We are now ready to present the fpt-algorithm for (r, s)-CloseStrwo(d r , t) and prove its correctness (an illustration of the algorithm is provided by Fig. 3). Input: Proof Let (S, d s , d r , t) be an instance of (r, s)-CloseStrwo(d r , t). We assume that k ≥ 5t, since for all other instances, k can be considered as a parameter and therefore they can be solved in fpt-time according to Theorem 3. The algorithm is presented as Algorithm 1 and in the following, we denote it by Solve CSO. The algorithm is formulated in a recursive way and in any recursive call, it receives as input a set S of the remaining input strings (i. e., the initial input strings with some outliers removed), a number t that denotes how many outlier-choices are left, a current candidate string s (over Σ ∪ { }) and a number d denoting how many branching steps are left. Throughout this proof, we use S , t , s and d to denote the current input of a recursive call of Solve CSO in order to avoid any confusion with the components S, d s , d r and t of the (r, s)-CloseStrwo-instance to be solved.
We first show that any recursive call to Solve CSO(S , t , s , d ) returns after fpttime with respect to d r , d and t .
In the following, we say that a tuple (S , t , s , d ) is valid if |S |−t = |S|−t, there exists an optimal solution (S * , s * ) for which S * ⊆ S , |S * | = |S |−t , d H (s , s * ) ≤ d , and s is a lower bound for s * (in the sense defined above). A call of the algorithm is valid if its parameters form a valid tuple, its witness is the pair (S * , s * ).
Claim 2 Any valid call to Solve CSO either directly returns a solution or performs at least one recursive valid call. In the following cases, we can thus assume that the algorithm reaches Line 5. Indeed, if it returns on Line 3 then it returns a solution, and if it returns on Line 4 then we have d = t = 0, which is dealt in Case 1 above (the algorithm may not return on this line when it has a valid input). We can thus define s j to be the string selected in Line 5.

Proof of Claim 2 Let
Case 3 s j ∈ S \S * . Then in particular t > 0; and since S * ⊆ S \{s j }, the recursive call in Line 7 is valid, with the same witness (S * , s * ). Case 4 s j ∈ S * , d = 0 and t > 0. Then s = s * , let s j be any string of S \S * , and S + = S * \{s j } ∪ {s j }. Then the pair (S + , s * ) is a solution, since d H (s * , s j ) ≤ d H (s * , s j ) by definition of s j . Thus the recursive call on Line 7 is valid, with witness (S + , s * ). Next, we observe that from now on, we can assume that d > 0 and t > 0. Indeed, if d = 0, t = 0, then we have already dealt with this situation in case 1. If d = 0, t > 0 then either case 3 applies (i. e., if s j ∈ S \S * ) or case 4 applies (i. e., if s j ∈ S * ). Finally, if d > 0, t = 0, then either case 2 applies (i. e., if ∀s ∈ S : d H (s, s ) ≤ d r ) or, depending on whether or not s j ∈ S * , either case 3 applies or case 5 applies. Moreover, with cases 3 and 5, we can assume that s j ∈ S * and d H (s j , s ) ≤ d r (i.e. d H (s, s ) ≤ d r for all s ∈ S * ). In this case no character from s j can be used to improve our current solution, so the character switching procedure Line 13 will not improve the solution, but still s j is part of our witness set S * , so it is not clear a priori that we can remove s j from our current solution, i.e. that the recursive call on Line 7 is valid. We will now show that also in this case the recursive call on Line 7 is valid. Let s + be obtained from s by filling the -positions of s with the corresponding symbols of s * . We now show that (S * , s + ) is a solution and we start with showing that s + satisfies the radius bound. To this end, let s ∈ S * be chosen arbitrarily.  In Particular, Claim 2 implies that any valid call to Solve CSO returns a solution. Indeed, if it does not directly return a solution, then it receives a solution of a more constrained instance from a valid recursive call, which is returned on Line 8 or 14.
Next, we show that starting the algorithm with parameters S = S, t = t, s = s m and d = 2d r + D (where D is the number of disputed columns) is a valid call.

Proof of Claim 3
Consider a solution (S * , s * ). We need to check whether d H (s * , s m ) ≤ 2d r + D, and whether s m is a lower bound of s * . The latter follows by definition and has already been observed above. String s * can be seen as a solution of (r, s)-CloseStr over S * , d r , d s , thus, Lemma 1 implies that the distance between s * and the majority string of S * is at most 2d r . Hence there are at most 2d r mismatches between s m and s * in non-disputed columns (since in those columns, the majority characters are identical in S and S * ). Adding the D mismatches from disputed columns, we get the 2d r + D upper bound.
(Claim 3) Finally, we note that, according to Lemma 2 (recall that we initially made the assumption k ≥ 5t), D ≤ 4d r . Consequently, the above claims imply that calling Solve CSO with parameters S, t, s m , 6d r solves the (r, s)-CloseStrwo instance in time O * ((d r + 1) 6d r 2 6d r +t ).
Next, we consider the (r)and (s)-variants of CloseStr-wo. With respect to (r)-CloseStrwo, the fixed-parameter tractability with respect to k and (|Σ|, d r , k − t) are reported as open problems in [6]. Since Theorem 3 also applies to , c is the same instance, but with the appropriate outliers crossed out and a solution string representing a k c -clique, and d shows the (s)-CloseSubstr instance obtained from the graph by the reduction of Theorem 7 [where m = k c + 1 = 4, d s = (|E|(k c + 2) + 1)qk c + |E|k c − k c (k c −1) 2 = 234; note that by definition, each of the the first q = 2 strings, i. e., the strings V j , is repeated |E|(k c + 2) + 1 = 36 times], the appropriate substrings of length k c + 1 highlighted in grey and a solution string representing a k c -clique CloseStr is hard, while its (s)-variant is trivial. For the substring-variant we have a quite different situation, since both single-bound variants of CloseSubstr are hard. We shall see next that the outlier-variant resembles the substring-variant in this regard, i. e., both single-bound variants are hard [for the (r)-variant this is known [6], while for the (s)-variant this is established by the following theorem].
We use a reduction from the problem Multi-Coloured Clique (which is W[1]hard, see [12]). The problem Multi-Coloured Clique is identical to the standard parameterisation of Clique (i. e., we want to find a clique of a given size k c , and k c is also the parameter), but the input graph G = (V , E) has a partition V = V 1 ∪· · ·∪V k c , such that every V i , 1 ≤ i ≤ k c , is an independent set (we denote the parameter by k c to avoid confusion with the number of input strings k).
Let G = (V 1 ∪· · ·∪V k c , E) be a Multi-Coloured Clique instance. Without loss of generality, we assume that, for some q ∈ N, i. e., each vertex has an index depending on its colour-class and its rank within its colour-class.
= v i , j and all other non-defined positions are filled with symbols from Γ such that each x ∈ Γ has exactly one occurrence in the strings s e , e ∈ E. We set S = {s e | e ∈ E}, t = |E| − k c 2 and d s = k c 2 (k c − 2). See Fig. 4(a), (b) and (c) for an illustration of the reduction and the following proof.

The Substring-Variant
In this section, we consider the substring-variants of CloseStr, i. e., the different variants of the problem CloseSubstr. Similar to CloseStr, all parameterisations of the (r)-variant, and almost all parameterisations of the (s)-variant are already settled in the literature (while the variant with both bounds has not yet been considered in the literature). As has been done in Sect. 2 for (r, s)-CloseStr, we are able to classify all parameterisations of (r, s)-CloseSubstr (and its single-bound variants) with respect to the parameters , k, m, d r , d s and |Σ| into either fixed-parameter tractable or W[1]-hard (thus, also solving the case left open in the literature with respect to (s)-CloseSubstr).
With respect to the (s)-variant, the status of (s)-CloseSubstr( ) is unknown, which is mentioned as open problem in [29]. We shall first close this gap by proving this parameterisation to be W[1]-hard.
We devise a reduction from Multi-Coloured Clique.
, each vertex has an index depending on its colour-class and its rank within its colour-class. Let Σ = V ∪ {$, }. For every j, 1 ≤ j ≤ q, we list all jth elements of the colour-classes as a string where |E e | = k c , the positions i and i of E e are v i, j and v i , j , respectively, and all remaining positions are . The (s)-CloseSubstr instance is now defined as follows. Let S contain N = |E|(k c + 2) + 1 occurrences of each V j , 1 ≤ j ≤ q, and one occurrence of each E e , e ∈ E, and let m = k c + 1. See Fig. 4a, d for an illustration of the reduction.
For proving the correctness of the reduction, we first extend the notation of radius optimal and distance sum optimal to sets S ⊆ Σ ≤ and strings s ∈ Σ m in the natural way by taking all sets S of length-m substrings of the string in S into account. The next lemma shows that distance sum optimal strings (with respect to S and m) are basically lists of vertices from each colour-class.
Proof We first note that a string s ∈ Σ k c +1 is a majority string of {V j | 1 ≤ j ≤ q} if and only if s ∈ {$}·V 1 ·V 2 ·. . .·V k c . More precisely, the first column of {V j | 1 ≤ j ≤ q} is $ q and, for every i, 2 ≤ i ≤ k c + 1, the ith column of {V j | 1 ≤ j ≤ q} contains every vertex from V i exactly once, so every v ∈ V i has majority in column i. Now let s ∈ Σ k c +1 be such that s is not a majority string for {V j | 1 ≤ j ≤ q}, which implies that s is not distance sum optimal with respect to {V j | 1 ≤ j ≤ q}. Since every V j , 1 ≤ j ≤ q, has N = |E|(k c + 2) + 1 occurrences in S, any majority string for {V j | 1 ≤ j ≤ q}, in comparison with s, causes at least |E|(k c +2)+1 fewer mismatches with respect to all occurrences of the strings {V j | 1 ≤ j ≤ q}. Since the total number of symbols of the remaining strings in {E e | e ∈ E} is |E|(k c + 2), this cannot be compensated, which means that a majority string for {V j | 1 ≤ j ≤ q} has lower distance sum than s and therefore s is not distance sum optimal with respect to S and m. Now let s be distance sum optimal with respect to S and m. From Lemma 3, we can conclude that s = $v 1,r 1 v 2,r 2 . . . v k c ,r kc , for some r j ∈ {1, 2, . . . , q}, 1 ≤ j ≤ k c . Let K be the corresponding set of vertices induced by s, i.e., K = {v 1,r 1 , v 2,r 2 , . . . , v k c ,r kc }.

Lemma 4
Let e ∈ E. The optimal distance between s and a length-(k c + 1) substring of E e is k c − 1 if e ⊆ K , and k c otherwise.
Proof We first recall that s [1]  Using the lemmas from above, we can now show the correctness of the reduction.

Theorem 7 (s)-CloseSubstr( , m) is W[1]-hard.
Proof We first note that = k c + 2 and m = k c + 1; thus, the parameters are bounded by a function in k c .
Let s ∈ Σ k c +1 be distance sum optimal with respect to S and m, and let K be the corresponding set of vertices. We first note that the total distance from s to the N copies of the strings V j , 1 ≤ j ≤ q, is exactly Nqk c . According to Lemma 4, for every e ∈ E, the optimal distance sum between s and the respective substring of E e is k c − 1 if e ⊆ K , and k c otherwise. Hence, the total distance sum from s to the respective substrings of E e , e ∈ E, is |E|k c − r , where r = {e ∈ E | e ⊆ K }, and the total distance sum between s and S is therefore Nqk c + |E|k c − r . This implies that the distance sum between s and S is Nqk c + |E|k c − k c 2 if and only if r = k c 2 if and only if K is a clique of size k c . Consequently, the above reduction, with the addition of d s = Nqk c + |E|k c − k c 2 , is a parameterised reduction from Multi-Coloured Clique to (s)-CloseSubstr( , m).
Theorem 7 together with known results from the literature completely settle the parameterised complexity of (s)-CloseSubstr. See Table 3 for an illustration. 2 Moving on to the problem (r, s)-CloseSubstr, we first observe that reducing (s)-CloseSubstr to (r, s)-CloseSubstr by setting d r = m is a parameterised reduction from (s)-CloseSubstr( , m) to (r, s)-CloseSubstr( , m, d r ), which implies the following corollary.

Kernelisation
Before we can discuss kernelisation results for the consensus string problems, we need a few more preliminary concepts and results (see [18] for a recent textbook on kernelisation). First, we recall the concept of a polynomial parameter transformation from [4]. A polynomial parameter transformation from a parameterised problem P 1 to a parameterised problem P 2 is a polynomial time computable function f that maps P 1 instances to P 2 instances and a polynomial p, such that, for every P 1 instance (x, k) with f (x, k) = (x , k ), we have -(x, k) is a positive instance if and only if (x , k ) is a positive instance, k ≤ p(k).

Theorem 9 ([4])
Let P 1 and P 2 be parameterised problems, and let P 1 and P 2 be the corresponding classical problems derived from P 1 and P 2 . Moreover, assume that P 1 is NP-complete, P 2 ∈ NP and there is a polynomial parameter transformation from P 1 to P 2 . If P 2 has a polynomial kernel, then P 1 has a polynomial kernel.
While Theorem 9 allows to carry over the existence of a polynomial kernel from one problem to another, it also allows to show that a problem most likely does not have a polynomial kernel. More precisely, if a parameterised problem P has no polynomial kernel (with respect to some complexity theoretical assumption) and there is a polynomial parameter transformation from P to some parameterised problem P , then also P has no polynomial kernel (with respect to the same assumption).
Next, we recall the concept of a polynomial equivalence relation from [5]. Let R be an equivalence relation over Δ * . The relation R is a polynomial equivalence relation if the following conditions are satisfied: -For given x, y ∈ Δ * , we can decide whether x and y are R-equivalent in polynomial time.
-For every finite X ⊆ Δ * the relation R partitions X into a number of classes that is polynomially bounded in max{|x| | x ∈ X }.
Finally, we recall the concept of a cross composition from [5]. Let K ⊆ Δ * be a problem (interpreted as a language), let R be a polynomial equivalence relation on Δ * and let P ⊆ (Δ * × N) be a parameterised problem. A cross-composition from K into P (with respect to R) is an algorithm that, given instances x 1 , x 2 , . . . , x q ∈ Δ * of K that belong to the same R-equivalence class, takes time polynomial in q i=1 |x i | and produces an instance (y, k) ∈ Δ * × N such that the following holds: The purpose of cross-decompositions is demonstrated by the following theorem: If there is a cross-composition of an NP-hard problem into a parameterised problem P, then P does not have a polynomial kernel, unless coNP ⊆ NP/poly.
Note that coNP ⊆ NP/poly implies a collapse of the polynomial hierarchy and is considered unlikely. We are now ready to present the kernelisation results, which shall be proved by applying the framework provided above.
For the (d r )-variants of CloseStr and CloseSubstr, the question whether the fixed-parameter tractable variants have a polynomial kernel has already been investigated in [2]. We restate the results relevant for us: Proof Transforming an (r)-CloseStr instance into an (r, s)-CloseStr instance by setting d s = kd r is a polynomial parameter transformation from the problem (r)-CloseStr(d r , , |Σ|) to (r, s)-CloseStr(d r , , |Σ|); thus, Theorem 11 implies the first statement of the proposition.
Next, we briefly recall the O(k 2 d r log k) kernel for (r)-CloseStr(k, d r ) (see [2,20]). For an (r)-CloseStr(k, d r )-instance, we call a column dirty if it contains at least two different symbols. By the pigeon-hole principle, a positive (r)-CloseStr(k, d r )instance can have at most k · d r dirty columns; thus, deleting all columns that are not dirty yields a kernel of size O(k 2 d r log k) and it is obvious that this is also a kernel for (r, s)-CloseStr(k, d r ). This proves the second statement.
If k > d s , then the only possible solution strings are the input strings, which can be checked in polynomial time (and the instance can accordingly be reduced to a trivial positive or negative kernel of constant size With respect to (r, s)-CloseStr, this leaves the case open where only k (or k and |Σ|, which, due to the dependency |Σ| ≤ k (see [20]), is the same question) is a parameter (regarding this case, note that for (r)-CloseStr(k) no combinatorial kernel or combinatorial fpt-algorithm is known).
Next, we take a look at kernelisation questions for (r, s)-CloseSubstr. Proof Transforming an (r)-CloseSubstr instance into an (r, s)-CloseSubstr instance by setting d s = kd r is a polynomial parameter transformation from (r)-CloseSubstr(k, m, d r , |Σ|) to (r, s)-CloseSubstr(k, m, d r , d s , Σ); thus, Theorem 11 implies the first statement of the proposition.
Since we can assume that d r , d s ≤ k, any (r, s)-CloseSubstr( , k) instance has size O( k), which proves the second point.
If d s < k, then the solution string must be a substring of an input string (see Remark 1), which can be checked in polynomial time (and the instance can accordingly be reduced to a trivial positive or negative kernel of constant size). If, on the other hand, k ≤ d s , then the instance has size O( d s ). For the outlier-variant, no kernelisation lower bounds are known so far. However, the following can be concluded from [2].

Proposition 3
The following problems have no polynomial kernel unless coNP ⊆ NP/poly.
As our main contribution to the question of kernelisation hardness of consensus string problems, we present a cross-composition from (r)-CloseStr into (r)-CloseStrwo, which allows us to rule out a polynomial kernel for the parameterisation (d r , d s , , (k − t), |Σ|) of (r)-CloseStrwo. Proof We prove the result by a cross-composition of (r)-CloseStr (over the alphabet Σ = {0, 1}) into (r)-CloseStrwo(d r , d s , , (k −t), |Σ|) (note that (r)-CloseStr is NP-complete for binary alphabets [19]). We first recall that an (r)-CloseStr instance is a tuple (S, d r ) with S = {s i | 1 ≤ i ≤ k} ⊆ Σ for some ∈ N, and d r ∈ N; in the following, we denote its total size by |(S, d r )|. We note that |(S, d r )| = O(k log(|Σ|) + log(d r )) = O(k ) (since we assume |Σ| = 2 and d r ≤ ).
We define an equivalence relation ∼ over the set of (r)-CloseStr instances as follows. For j ∈ {1, 2}, let S j = {s j,i | 1 ≤ i ≤ k j } ⊆ Σ j and d r , j ∈ N be two (r)-CloseStr instances. Then (S 1 , d r ,1 ) ∼ (S 2 , d r ,2 ) if k 1 = k 2 , 1 = 2 and d r ,1 = d r ,2 . For any two instances (S 1 , d r ,1 ) and (S 2 , d r ,2 ), it can be checked in time polynomial in |(S 1 , d r ,1 )| + |(S 2 , d r ,2 )| whether or not (S 1 , d r ,1 ) ∼ (S 2 , d r ,2 ). Let X be a finite set of (r)-CloseStr instances with k, and d r being the largest number of strings, lengths of strings and radius bound that occur in any instances of X (note that these parameters can occur in different instances). Obviously, the number of equivalence classes of X (with respect to relation ∼) is bounded by ( k d r ). Moreover, each of k, and d r is bounded by max{|x| | x ∈ X } (note that d r is bounded by max{|x| | x ∈ X } since we can assume d r ≤ for all instances). This implies that ∼ partitions X into at most (max{|(S, d r )| | (S, d r ) ∈ X }) O(1) equivalence classes. Consequently, ∼ is a polynomial equivalence relation. Now let (S 1 , d r ), (S 2 , d r ), . . . , (S q , d r ) be ∼-equivalent (r)-CloseStr instances, where, for the sake of convenience, S i = {s i,1 , s i,2 , . . . , s i,k } ⊆ Σ , 1 ≤ i ≤ q. For every i, 1 ≤ i ≤ q, let B i denote the binary representation of i with exactly log(q) bits, and let C i = (B i ) 2d r +1 (i. e., C i is the (2d r + 1)-fold repetition of the binary string B i ). Moreover, for every i, 1 ≤ i ≤ q, let S i = {s i,1 , s i,2 , . . . , s i,k }, where, for every j, 1 ≤ j ≤ k, s i, j = s i, j C i . Finally, let the (r, s)-CloseStrwo instance be (S , d r , d s , t) with S = q i=1 S i , d r = d r , d s = kd r and t = (q − 1)k. Note that (S , d r , d s , t) is a valid (r, s)-CloseStrwo instance with k = qk input strings that are all of the same length = + (2d r + 1) log(q) . This construction can clearly be computed in polynomial time and, in order to show that it is a correct crosscomposition of (r)-CloseStr into (r, s)-CloseStrwo(d r , d s , , (k − t), |Σ|), we have to prove the following claims.

Proof of Claim
(Claim1) for every s ∈ Σ , which is a contradiction. Hence, i 1 = i 2 = . . . = i k , which implies that A = S i , for some i, 1 ≤ i ≤ q. Since all strings in S i have the same length-((2d r +1) log(q) ) suffix C i and since we obtain the strings of S i if we remove this common suffix, we conclude that r H (s [1.. ], S i ) ≤ r H (s, S i ) ≤ d r = d r , which implies that (S i , d r ) is a positive (r)-CloseStr instance.
In order to prove the if direction, let, for some i, 1 ≤ i ≤ q, (S i , d r ) be a positive (r)-CloseStr instance, i. e., there is an s ∈ Σ with r H (s, S i ) ≤ d r . Hence, r H (sC i , S i ) ≤ d r , which, in particular, implies that s H (sC i , S i ) ≤ kd r = d s . Consequently, (S , d r , d s , t) is a positive (r, s)-CloseStrwo(d r , d s , , (k − t), |Σ|) instance as witnessed by the inlier-set S i ⊆ S and solution string sC i . (Claim 2) This concludes the proof.

Conclusions
In this section, we discuss our results and state the most interesting open problems that are left for further research.
Our main positive algorithmic result is the branching algorithm from Theorem 5. It demonstrates that the fixed-parameter tractability of CloseStr with respect to the practically most relevant parameter d r is robust in the sense that we can afford to also exclude outliers (as long as their number is also treated as a parameter) and add a distance sum bound (see also the motivation for such a problem variant at the beginning of Sect. 1).
Moreover, we provide a complete "fixed-parameter tractability map" for the problems CloseStr and CloseSubstr, i. e., their general variants and their single-bound variants, for all possible combinations of the parameters k, , d r , d s , |Σ| and m. This is done by complementing the existing work with respect to the single-bound variants and by adapting these results to the general variant.
In this regard, (r, s)-CloseStr shows the same positive tractability results as the (r)-variant, i. e., it is fixed-parameter tractable for all single parameters except |Σ|. With respect to the substring-variant, our results demonstrate that adding a distance sum bound may increase the complexity. For example, while parameter is sufficient for fixed-parameter tractability with respect to the radius variant, we get a W[1]-hard problem if we add a distance sum bound, even if we additionally take m and d r as parameters. In order to maintain fixed-parameter tractability, we would have to also treat the distance sum bound as a parameter, or to take k as a parameter. In general, for the general or single-bound variants of the substring-variant, things do not look good fpt-wise.
The only questions left open with respect to CloseStr and CloseSubstr are about whether the fixed-parameter tractable variants allow polynomial kernels. More precisely, it is unknown whether the general or the radius variant of CloseStr(k) allows a polynomial kernel and for the substring-variant several cases are open, which are summarised in Question 3. With respect to (r)-CloseStr(k), there is a more important question still open: the only fixed-parameter tractability results rely on integer linear programming and a combinatorial fpt-algorithm is still unknown. This has already been reported in [20] and is explicitly stated as an open problem in the survey [7]. In particular, we stress the fact that several of our fixed-parameter tractability results extend or directly use the ILP approach from [20] and therefore have the same issue; namely, this is the case for (r, s)-CloseStr(k) and (r, s)-CloseStrwo(k) and their single-bound variants.
With respect to the outlier-variant, our "fixed-parameter tractability map" is still rather incomplete (see also Table 2), both in the sense that for several parameterisations fixed-parameter tractability is unknown and for some fixed-parameter tractability variants it is unknown whether they allow polynomial kernels. The existing results show that, for fixed-parameter tractability, the single parameter k is sufficient (although based on ILP) and parameterising by the number of outliers t is also enough if at least one of the parameter , d r , d s or k −t is taken into consideration as well. Unfortunately, t and |Σ| is not enough and, surprisingly, for any other combination containing |Σ| (except the trivial one (|Σ|, )), we were not able to prove fixed-parameter tractability or W[1]-hardness. This is wort pointing out, since the parameter |Σ| is rather important due to the fact that in practical scenarios it can often be assumed to be of rather small constant size. Consequently, the most important open question with respect to the outlier-variant is whether fixed-parameter tractability can be achieved by coupling |Σ| with any of the parameters d r , d s or k − t (see Question 1).