A (1.4+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1.4 + \epsilon )$$\end{document}-approximation algorithm for the 2-Max-Duo problem

The maximum duo-preservation string mapping (Max-Duo) problem is the complement of the well studied minimum common string partition problem, both of which have applications in many fields including text compression and bioinformatics. k-Max-Duo is the restricted version of Max-Duo, where every letter of the alphabet occurs at most k times in each of the strings, which is readily reduced into the well known maximum independent set (MIS) problem on a graph of maximum degree Δ≤6(k-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \le 6(k-1)$$\end{document}. In particular, 2-Max-Duo can then be approximated arbitrarily close to 1.8 using the state-of-the-art approximation algorithm for the MIS problem on bounded-degree graphs. 2-Max-Duo was proved APX-hard and very recently a (1.6+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1.6 + \epsilon )$$\end{document}-approximation algorithm was claimed, for any ϵ>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon > 0$$\end{document}. In this paper, we present a vertex-degree reduction technique, based on which, we show that 2-Max-Duo can be approximated arbitrarily close to 1.4.


Introduction
The minimum common string partition (MCSP) problem is a well-studied string comparison problem in computer science, with applications in fields such as text compression and bioinformatics. In both text compression and bioinformatics, string (or sequence) comparison is a routine work. For the similarity between two strings, a commonly used measure is the edit distance, which is the minimum number of edit operations required to transform one string into the other. At the finest scale, the edit operations involve a single character of a string, including insertion, deletion, and substitution. When comparing two long strings such as the whole genomes of multiple species, long range operations become more interesting, leading to the genome rearrangement problems (Chen et al. 2005;Swenson et al. 2008). In particular, a transportation operation is to cut out a substring and insert it back at another position in the string. The problem of partitioning one string into a minimum number of substrings such that a reshuffle of them becomes the other string is referred to as the minimum common string partition (MCSP) problem.
MCSP was first introduced by Goldstein et al. (2004), and can be defined as follows: Consider two length-n strings A = (a 1 , a 2 , . . . , a n ) and B = (b 1 , b 2 , . . . , b n ) over some alphabet , such that B is a re-ordering of A. Let P A be a partition of A, which is a multi-set of substrings whose concatenation in a certain order becomes A. The cardinality of P A is the number of substrings in P A . The MCSP problem asks to find a minimum cardinality partition P A of A which is also a partition of B. k-MCSP denotes the restricted version of MCSP where every letter of the alphabet occurs at most k times in each of the two given strings. Goldstein et al. (2004) have shown that the MCSP problem is NP-hard and APXhard, even when k = 2. There have been several approximation algorithms (Chen et al. 2005;Chrobak et al. 2004;Cormode and Muthukrishnan 2007;Goldstein et al. 2004;Waleń 2006, 2007) proposed since 2004, among which the current best result is an O(log n log * n)-approximation algorithm for the general MCSP and an O(k)-approximation algorithm for k-MCSP. On the other hand, MCSP is proved to be fixed parameter tractable (FPT), with respect to the cardinalities of the parts in an optimal partition and/or a combination of the cardinalities and k (Damaschke 2008;Jiang et al. 2012;Bulteau et al. 2013;Bulteau and Komusiewicz 2014).
In a given string A, a pair of adjacent letters in A is called a duo of the string A ( Goldstein et al. 2004); a length-substring in a partition of A preserves − 1 duos of A. The complementary objective to that of MCSP is to maximize the number of duos preserved in a common partition of A and B, and such an optimization problem is referred to as the maximum duo-preservation string mapping (MPSM) problem by Chen et al. (2014). In this paper, we call this MPSM problem as Max-Duo, mostly because the acronym MPSM looks too similar to the other acronyms. Analogously, k-Max-Duo is the restricted version of Max-Duo where every letter of the alphabet occurs at most k times in each given string. Max-Duo was proved to be FPT by Beretta et al. (2016a, b), with respect to the number of preserved duos in the optimal partition. In this paper, we focus on 2-Max-Duo, to design an improved approximation algorithm.
Along with Max-Duo, Chen et al. (2014) introduced the constrained maximum induced subgraph (CMIS) problem, in which one is given an m-partite graph G = (V 1 , V 2 , . . . , V m , E) with each V i having n 2 i vertices arranged in an n i × n i matrix, and the goal is to find n i vertices in each V i from different rows and different columns such that the number of edges in the induced subgraph is maximized. k-CMIS is the restricted version of CMIS where n i ≤ k for all i. Given an instance of Max-Duo, we may construct an instance of CMIS by setting m to be the number of distinct letters in the string A, and n i to be the number of occurrences of the i-th distinct letter; the reduction to the MIS problem, we present a vertex-degree reduction scheme and design an improved (1.4 + )-approximation algorithm, for any > 0. The following chart summarizes the approximation results for the 2-Max-Duo problem: 2 (Chen et al. 2014) −→ 1.8 + (Boria et al. 2014) (→ falsely claimed 1.6 + (Boria et al. 2014)) −→ 1.4 + (Theorem 3.2).
The rest of the paper is organized as follows. We provide some preliminaries in Sect. 2, including several important structural properties of the graph constructed from the two given strings. The vertex-degree reduction scheme is also presented as a separate subsection in Sect. 2. The new approximation algorithm, denoted as Approx, is presented in Sect. 3, where we show that it is a (1.4 + )-approximation algorithm for 2-Max-Duo. In Sect. 4, we review the APX-hardness reduction from 3-MIS to 2-Max-Duo and point out a direction for better approximating 2-Max-Duo. We conclude the paper in Sect. 5.

Preliminaries
Consider an instance of the k-Max-Duo problem with two length-n strings A = (a 1 , a 2 , . . . , a n ) and B = (b 1 , b 2 , . . . , b n ) such that B is a re-ordering of A. Recall that we can view the instance as a bipartite graph H = (A, B, F), where the vertices in A and B are a 1 , a 2 , . . . , a n in order and b 1 , b 2 , . . . , b n in order, respectively, and there is an edge between a i ∈ A and b j ∈ B if they are the same letter, denoted as e i, j . See Fig. 1a for an example, where A = (a, b, c, d, e, f , b, c, d, e) and B = ( f , b, c, d, e, a, b, c, d, e). Note that H can be constructed in O(n 2 ) time, and |F| ≤ kn.
The two edges e i, j , e i+1, j+1 ∈ F are called a pair of parallel edges (and they are said to be parallel to each other); when both are included in a perfect matching of H , the corresponding duo (a i , a i+1 ) of A is preserved. Two pairs of parallel edges are conflicting if they cannot co-exist in any perfect matching of H . This motivates the following reduction from the k-Max-Duo problem to the MIS problem: From the bipartite graph H = (A, B, F), we construct another graph G = (V , E) in which a vertex v i, j of V corresponds to the pair of parallel edges (e i, j , e i+1, j+1 ) of F; two vertices of V are conflicting if and only if the two corresponding pairs of parallel edges are conflicting, and two conflicting vertices of V are adjacent in G. We remark that in general the graph G is not bipartite. One can see that a set of duos of A that can be preserved all together, a set of pairwise non-conflicting pairs of parallel edges of F, and an independent set in G, are equivalent to each other. See Fig. 1b for an example of the graph G = (V , E) constructed from the bipartite graph H shown in Fig. 1a. We note that |V | ≤ k(n − 1) and thus G can be constructed in O(k 2 n 2 ) time from the instance of the k-Max-Duo problem.
In the graph G, for any v ∈ V , we use N (v) to denote the set of its neighbors, that is, the vertices adjacent to v. The two ordered letters in the duo corresponding to the vertex v are referred to as the letter content of v. For example, in Fig. 1b, the letter content of v 1,6 is "ab" and the letter content of v 6,1 is " f b".
where the eight filled vertices form an independent set.  (a, b, c, d, e, f , b, c, d, e) and B = ( f , b, c, d, e, a, b, c, d, e).  v 2,7 , v 3,8 , v 4,9 , v 6,1 , v 7,2 , v 8,3 , v 9,4 } corresponds to the eight pairs of parallel edges shown in Fig. 1a, and consequently also corresponds to the eight preserved duos. In this instance, we have k = 2. Any maximum independent set of G must contain some of the degree-6 vertices, invalidating the (1.6 + )approximation algorithm for 2-Max-Duo proposed in Boria et al. (2014) Recall from the construction that there is an edge e i, j in the graph H = (A, B, F) if a i = b j , (therefore the induced subgraph by a specific letter in H is a complete bipartite graph), and there is a vertex v i, j in the graph G = (V , E) if the parallel edges e i, j and e i+1, j+1 are in H = (A, B, F).

Lemma 2.1
The graph G = (V , E) has the following properties.
(1) Proof By definition, v i, j ∈ V if and only if e i, j , e i+1, j+1 ∈ F.
1. If also v i+2, j+2 ∈ V , that is, e i+2, j+2 , e i+3, j+3 ∈ F, then e i+1, j+1 , e i+2, j+2 ∈ F leading to v i+1, j+1 ∈ V . 2. Note that an edge e i, j ∈ F if and only if the two vertices a i and b j are the same letter, and clearly each connected component in H is complete bipartite and all the vertices are the same letter. It follows that if the induced subgraph H = (A , B , F ) in H is connected, then all its vertices are the same letter. Let The subgraph H = (A , B , F ) in H has exactly the same topology as H , and thus it is also connected and all its vertices are the same letter. Therefore, all the vertices of V have the same letter content; and consequently for any two vertices 3. For any vertex v i, j , or equivalently the pair of parallel edges (e i, j , e i+1, j+1 ) in F, which are incident at four vertices a i , a i+1 , b j , b j+1 , a conflicting pair of parallel edges can be one of the following six kinds: to share exactly one of a i and a i+1 , to share both a i and a i+1 , to share exactly one of b j and b j+1 , and to share both b j and b j+1 . The sets of these six kinds of conflicting pairs are as described in This proves the lemma.
From Lemma 2.1 and its proof, we see that for any vertex of V there are at most k − 1 conflicting vertices of each kind (corresponding to a set in Eq. (1)). We thus have the following corollary.

When k = 2
We examine more properties for the graph G = (V , E) when k = 2. First, from Corollary 2.2 we have ≤ 6. Berman and Fujito (1999) have presented an approximation algorithm with a performance ratio arbitrarily close to ( + 3)/5 for the MIS problem, on graphs with maximum degree . This immediately implies a (1.8+ )-approximation algorithm for 2-Max-Duo. Our goal is to reduce the maximum degree of the graph G = (V , E) to achieve a better approximation algorithm. To this purpose, we examine all the degree-6 and degree-5 vertices in the graph G, and show a scheme to safely remove them from consideration when computing an independent set. This gives rise to a new graph G 2 with maximum degree at most 4, leading to a desired (1.4 + )-approximation algorithm for 2-Max-Duo.
We remark that, in our scheme we first remove the degree-6 vertices from G to compute an independent set, and later we add half of these degree-6 vertices to the computed independent set to become the final solution. Contrary to the claim that there always exists a maximum independent set in G containing no degree-6 vertices (Boria et al. 2014, Lemma 1), the instance in Fig. 1 shows that any maximum independent set for the instance must contain some degree-6 vertices, thus invalidating the (1.6 + )approximation algorithm for 2-Max-Duo proposed in Boria et al. (2014).
In more details, the instance of 2-Max-Duo, illustrated in Fig. 1, consists of two length-10 strings A = (a, b, c, d, e, f , b, c, d, e) and B = ( f , b, c, d, e, a, b, c, d, e). Fig. 1a and the instance graph G = (V , E) of the MIS problem is shown in Fig. 1b. In the graph G, we have six degree-6 vertices: On the other hand, if none of these degree-6 vertices is included in an independent set, then because the four vertices v 4,4 , v 4,9 , v 9,4 , v 9,9 form a square implying that at most two of them can be included in the independent set, the independent set would be of size at most 6, and thus can never be maximum in G.
Consider a duo (a i , a i+1 ) of the string A. If its letter content is "aa", then either there is no vertex of V with letter content "aa" or there is exactly one vertex of V with letter content "aa" which is an isolated vertex in G. We thus assume without loss of generality that the letter content is "ab", where a = b.
If no duo of the string B has the same letter content "ab", then this duo of the string A can never be preserved (in fact, no vertex of V would have its letter content "ab"). If there is exactly one duo (b j , b j+1 ) of the string B having the same letter content "ab", then these two duos make up a vertex v i, j ∈ V , and from Lemma 2.1 we know that the degree of the vertex v i, j ∈ V is at most 5, since there is no such vertex v i, j with j = j sharing both a i and a i+1 with v i, j . Therefore, if the degree of the vertex v i, j ∈ V is six, then there must be two duos of the string A and two duos of the string B having the same letter content "ab". Assume the other duo of the string A and the other duo of the string B having the same letter content "ab" are (a i , a i +1 ) and We call the subgraph of G induced on these four vertices a square, and denote it as to their conflicting relationships. One clearly sees that every square has a unique letter content, which is the letter content of its four member vertices.
The following lemma is a direct consequence of how the graph G is constructed and k = 2.
The square S(i, i ; j, j ) shown in bold lines. The two non-adjacent vertices v i, j and v i , j of the square form a pair stated in Corollary 2.5; they have 6 common neighbors, of which two are inside the square and four are outside of the square We say the two vertices v i, j and v i+1, j+1 of V are consecutive; and we say the two squares S(i, i ; j, j ) and S(i + 1, i + 1; j + 1, j + 1) in G are consecutive. Clearly, two consecutive squares contain four pairs of consecutive vertices. The following Lemma 2.7 summarizes the fact that when two consecutive vertices belong to two different squares, then these two squares are also consecutive (and thus contain the other three pairs of consecutive vertices).
Proof This is a direct result of the fact that no two distinct squares have any member vertex in common, due to each square having its unique letter content.
A series of p consecutive squares {S(i +q, i +q; j +q, j +q), in fact by Lemma 2.1 there can be as many as two of these four vertices existing in V (however, more than two would imply the existence of the square). Similarly, there can be as many as two of the four vertices v i+ p, j+ p , v i + p, j + p , v i+ p, j + p , v i + p, j+ p existing in V . In the sequel, a maximal series of p consecutive squares starting with S(i, i ; j, j ) is denoted as S p (i, i ; j, j ), where p ≥ 1. See for an example in Fig. 3b where there is a maximal series of 2 consecutive squares S 2 (2, 8; 2, 8), where the instance of the 2-Max-Duo is expanded slightly from the instance shown in Fig. 1 (a, b, c, d, e, f , g, b, c, d, e, h, y, x) and B = (g, b, c, d, e, h, a, b, c, d, x, y, e, f ). The bipartite graph H = (A, B, F) is shown in Fig. 3a and the constructed instance graph G = (V , E) of the MIS problem is shown in Fig. 3b. There is a maximal series of 2 squares S 2 (2, 8; 2, 8) in G, associated with the four substrings "bcd". After the removal of the four substrings "bc", we achieve A = (a, d, e, f , g, d, e, h, y, x) and B = (g, d, e, h, a, d, x, y, e, f ), for which the bipartite graph H = (A , B , F ) is shown in Fig. 3c Proof By the definition of the square S(i + q, i + q; j + q, j + q), for each q = 0, 1, . . . , p − 1, we have a i+q = a i +q and a i+q+1 = a i +q+1 ; we thus conclude that the two substrings (a i , a i+1 , . . . , a i+ p ) and (a i , a i +1 , . . . , a i + p ) are identical. In Fig. 3b, for S 2 (2, 8; 2, 8) the two substrings are "bcd". If these two substrings overlap, then there would be three occurrences of at least one letter, contradicting the fact that k = 2. This proves the first item. Next, for the square S(i + q, i + q; j + q, j + q), if one vertex, say v i+q, j+q , is in the maximum independent set I * , then due to maximality of I * and Lemma 2.4 another vertex, v i +q, j +q in this case, is also in I * . That is, either none or exactly two vertices of the square S(i + q, i + q; j + q, j + q) are in I * . It follows that if I * contains less than 2 p vertices from S p (i, i ; j, j ), then there is at least one square of which no vertex is in I * . Assume r is the least index such that no vertex of the square S(i + r , i + r ; j + r , j + r ) is in I * , and assume without loss of generality that at least one of the vertices v i+r +1, j+r +1 and v i +r +1, j +r +1 is in I * .
Note that the square S(i − 1, i − 1; j − 1, j − 1) does not exist in the graph G, and thus at most two of its four vertices (which are v i−1, j−1 , v i −1, j−1 , v i−1, j −1 , v i −1, j −1 ) exist in V . We claim that there are exactly two of these four vertices v i−1, j−1 , v i −1, j−1 , v i−1, j −1 , v i −1, j −1 exist in V and they both are in I * . Suppose otherwise there is at most one of these four vertices in I * , say v i−1, j−1 ; we may increase the size of I * by removing v i−1, j−1 together with the two vertices of the square S(i + q, i + q; j + q, j + q), for each q = 0, 1, . . . , r − 1, while adding the two vertices v i+q, j+q and v i +q, j +q of the square S(i + q, i + q; j + q, j + q), for each q = 0, 1, . . . , r , that is, removing 2r + 1 while adding 2r + 2 vertices, a contradiction. Again from Lemma 2.4, these two vertices existing in V are either A symmetric argument shows that either v i+ p, j+ p , v i + p, j + p ∈ I * , or v i + p, j+ p , v i+ p, j + p ∈ I * .
Lastly, if v i−1, j−1 , v i −1, j −1 ∈ I * and v i+ p, j+ p , v i + p, j + p ∈ I * (v i −1, j−1 , v i−1, j −1 ∈ I * and v i + p, j+ p , v i+ p, j + p ∈ I * , respectively), then I * can be expanded to include the two vertices v i+q, j+q and v i +q, j +q (v i +q, j+q and v i+q, j +q , respectively) of the square S(i + q, i + q; j + q, j + q), for each q = 0, 1, . . . , p − 1, a contradiction to the maximality of I * . This proves the second item of the lemma.
Suppose S p (i, i ; j, j ), where p ≥ 1, exists in the graph G. Let A denote the string obtained from A by removing the two substrings (a i , a i+1 , . . . , a i+ p−1 ) and (a i , a i +1 , . . . , a i + p−1 ) and concatenating the remainder together, and B denote the string obtained from B by removing the two substrings (b j , b j+1 , . . . , b j+ p−1 ) and (b j , b j +1 , . . . , b j + p−1 ) and concatenating the remainder. Let the graph G = (V , E ) denote the instance graph of the MIS problem constructed from the two strings A and B . See for an example G in Fig. 3d, where there is a maximal series of 2 consecutive squares S 2 (2, 8; 2, 8) in the graph G.  vertices v i −1, j−1 and v i+ p, j+ p . Therefore, starting with I , we can add exactly 2 p vertices from S p (i, i ; j, j ) to form an independent set in G, of which the maximality can be proved by a simple contradiction.
We remark that in the extreme case where none of the vertices of S(i −1, i −1; j − 1, j − 1) and none of the vertices of S(i + p, i + p; j + p, j + p) are in I , we may add either of the two sets of 2 p vertices from S p (i, i ; j, j ) to form a maximum independent set in G.
Iteratively applying the above string shrinkage process, or equivalently the vertex contracting process, associated with the elimination of a maximal series of consecutive squares. In O(n) iterations, we achieve the final graph containing no squares, which we denote as G 1 = (V 1 , E 1 ).

An approximation algorithm for 2-Max-Duo
A high-level description of the approximation algorithm, denoted as Approx, for the 2-Max-Duo problem is depicted in Fig. 4.
In more details, given an instance of the 2-Max-Duo problem with two length-n strings A and B, the first step of our algorithm is to construct the graph G = (V , E), which is done in O(n 2 ) time. In the second step (Lines 2-7 in Fig. 4), it iteratively applies the vertex contracting process presented in Sect. 2 at the existence of a maximal series of consecutive squares, and at the end it achieves the final graph G 1 = (V 1 , E 1 ) which does not contain any square. This second step can be done in O(n 2 ) time too since each iteration of the vertex contracting process is done in O(n) time and there are O(n) iterations. In the third step (Lines 8-10 in Fig. 4), let L 1 denote the set of singletons (degree-0 vertices) and leaves (degree-1 vertices) in the graph G 1 ; our algorithm removes all the vertices of L 1 and their neighbors from the graph G 1 to obtain the remainder graph G 2 = (V 2 , E 2 ). This step can be done in O(n 2 ) time too due to |V 1 | ≤ |V | ≤ 2n, and the resultant graph G 2 has maximum degree ≤ 4 by Corollaries 2.5 and 2.6 . (See for an example illustrated in Fig. 5a.) In the fourth step (Lines 11-12 in Fig. 4), our algorithm calls the state-of-the-art approximation algorithm for the MIS problem (Berman and Fujito 1999) on the graph G 2 to obtain an independent set I 2 in G 2 ; and returns I 1 = L 1 ∪ I 2 as an independent set in the v4,4 v10,4 v1,7 v7,1 v11,5 v5,13 (a) I 1 = {v 1,7 , v 7,1 , v 10,4 , v 11,5 , v 5,13 } is an independent set in G 1 , consisting of all the five leaves of G 1 = G shown in Figure 2.3d. v2,2 v3,3 v4,4 v10,4 v1,7 v2,8 v3,9 v7,1 v8,2 v9,3 v8,8 v9,9 v11,5 v5,13 (b) Using I 1 , since v 10,4 ∈ I 1 , the four vertices v 2,8 , v 3,9 , v 8,2 , v 9,3 are added to form an independent set I in the original graph G shown in Figure Fig. 3. The independent set I 1 in the graph G 1 is shown in Fig. 5a in filled circles, for which we did not apply the state-of-the-art approximation algorithm for the MIS problem. The independent set I in the graph G is shown in Fig. 5b in filled circles, according to Corollary 2.9 the four vertices v 2,8 , v 3,9 , v 8,2 , v 9,3 are added due to v 10,4 ∈ I 1 . The parallel edges of H corresponding to the vertices of I are shown in Fig. 5c, representing a feasible solution to the 2-Max-Duo instance shown in Fig. 3 graph G 1 . The running time of this step is dominated by the running time of the stateof-the-art approximation algorithm for the MIS problem, which is a high polynomial in n and 1/ . In the last step (Line 13 in Fig. 4), using the independent set I 1 in G 1 , our algorithm adds 2 p vertices from each maximal series of p consecutive squares according to Corollary 2.9, to produce an independent set I in the graph G. (For an illustrated example see Fig. 5b.) The last step can be done in O(n) time.
The state-of-the-art approximation algorithm for the MIS problem on a graph with maximum degree has a performance ratio of ( + 3)/5 + , for any > 0 (Berman and Fujito 1999).
Lemma 3.1 In the graph G 1 = (V 1 , E 1 ), let OPT 1 denote the cardinality of a maximum independent set in G 1 , and let SOL 1 denote the cardinality of the independent set I returned by the algorithm Approx. Then, OPT 1 ≤ (1.4 + )SOL 1 , for any > 0.
Proof Recall that L 1 denotes the set of singletons (degree-0 vertices) and leaves (degree-1 vertices) in the graph G 1 ; our algorithm Approx removes all the vertices of L 1 and their neighbors from the graph G 1 to obtain the remainder graph G 2 = (V 2 , E 2 ). The graph G 2 has maximum degree ≤ 4 by Corollaries 2.5 and 2.6 . Let OPT 2 denote the cardinality of a maximum independent set in G 2 , and let SOL 2 denote the cardinality of the independent set I 2 returned by the state-of-the-art approximation algorithm for the MIS problem. We have OPT 1 = |L 1 | + OPT 2 and OPT 2 ≤ (1.4 + )SOL 2 , for any > 0. Therefore, This proves the lemma.
Theorem 3.2 The 2-Max-Duo problem can be approximated within a ratio arbitrarily close to 1.4, by a linear reduction to the MIS problem on degree-4 graphs.

Proof
We prove by induction. At the presence of maximal series of p consecutive squares, we perform the vertex contracting process iteratively. In each iteration to handle one maximal series of p consecutive squares, let G and G denote the graph before and after the contracting step, respectively. Let OPT denote the cardinality of a maximum independent set in G , and let SOL denote the cardinality of the independent set I returned by the algorithm Approx. Given any > 0, from Lemma 3.1, we may assume that OPT ≤ (1.4 + )SOL .
Let OPT denote the cardinality of a maximum independent set in G, and let SOL denote the cardinality of the independent set returned by the algorithm Approx, which adds 2 p vertices from the maximal series of p consecutive squares to the independent set I in G , according to Corollary 2.9, to produce an independent set I in the graph G. Lemma 2.8 states that OPT = OPT + 2 p. Therefore, This proves that for the original graph G = (V , E) we also have OPT ≤ (1.4+ )SOL accordingly. That is, the worst-case performance ratio of our algorithm Approx is 1.4 + , for any > 0. The time complexity of the algorithm Approx has been determined to be polynomial at the beginning of the section, and it is dominated by the time complexity of the state-of-the-art approximation algorithm for the MIS problem on degree-4 graphs. The theorem is thus proved.

The APX-hardness reduction from 3-MIS to 2-Max-Duo
In the above approximation algorithm Approx for 2-Max-Duo, we apply a vertexdegree reduction scheme on the constructed instance graph of the MIS problem, to remove all the degree-6 vertices and all the degree-5 vertices. This scheme essentially reduces the 2-Max-Duo problem to computing a maximum independent set in a graph of maximum degree ≤ 4. One might wonder whether all the degree-4 vertices can be similarly removed. Goldstein et al. (2004) proved that the 2-MCSP problem is APX-hard via a linear reduction from the MIS problem on cubic graphs (3-MIS); Boria et al. (2014) showed that the same reduction could also be applied to prove that 2-Max-Duo is APX-hard. In this section, we review this APX-hardness reduction from 3-MIS to 2-Max-Duo, to point out that it is unlikely possible to further reduce the maximum degree from 4 to 3 by removing all the degree-4 vertices. d u · · a u b u · · c u d u e u · · b u e u f u g u · · f u h u k u · · g u l u · · h u b u · · c u d u · · a u b u e u · · d u e u f u h u · · f u g u l u · · h u k u · · g u A u : B u : Fig. 6 The instance I u = (A u , B u ) defined for each vertex u ∈ V . The two dots between a pair of two consecutive main substrings represent x i u y i u in A u and y i u x i u in B u , respectively, for i = 1, 2, . . . , 6. Each solid or dashed line connects a pair of common duos between A u and B u . The set of five duos connecting by solid lines is a unique optimal solution to I u Fig. 7 The gadget subgraph associated with the instance I u = (A u , B u ), in which there are nine vertices corresponding to the nine common duos between A u and B u Given a cubic graph G = (V , E ), an instance of 2-Max-Duo can be constructed in the following three steps.

For each vertex
as shown in Fig. 6, where both A u and B u are length-28 strings with seven main substrings, and each pair of two consecutive main substrings are separated by a substring of two letters x i u y i u in A u and by y i u x i u in B u , respectively, for i = 1, 2, . . . , 6. These 12 letters x i u 's and y i u 's are distinct, each appears only once in A u and is represented by a dot in Fig. 6. One can easily check that there are nine common duos between A u and B u , and the set of five duos connected by solid lines in Fig. 6 is the unique optimal solution to the instance I u . Equivalently, this constructs a gadget subgraph of the MIS problem, as shown in Fig. 6a, in which there are nine vertices one-to-one corresponding to the nine common duos and two vertices are adjacent if and only if they are conflicting. The vertex subset {a u b u , c u d u , e u f u , g u l u , h u k u } is the unique maximum independent set in this subgraph. 2. Orient each edge in E such that every vertex of V has at most two incoming edges and at most two outgoing edges. This can be done by partitioning G into a set of edge-disjoint cycles and a forest, followed by orienting the edges of a cycle to form a directed cycle, and rooting a tree at a leaf and then orienting the edges away from the root. 3. Let A = u∈V A u , B = u∈V B u , and the whole instance I = (A, B). For each directed edge (u, v) ∈ E , modify the instances I u and I v such that an optimal solution to I coincides with at most one of the optimal solutions to I u and I v . To this purpose, either the common duo a v b v is revised into l u b v (k u b v , respectively) to be in conflict with only the common duo g u l u (h u k u , respectively), or the common duo c v d v is revised into l u d v (k u d v , respectively) to be in conflict with only the common · · fu hu ku · · gu lu bv · · hu · · · dv · · lu · · cv dv ev · · · · fu gu lu · · hu ku · · gu · · · bv · · cv dv · · lu bv ev · · (a) The common duoa v b v is revised into l u b v to be in conflict with only the common duo g u l u .
· · fu hu ku · · gu lu dv ev · · hu · · · dv · · av bv · · lu · · · · fu gu lu · · hu ku · · gu · · · bv · · lu dv · · av bv ev · · (b) The common duoc v d v is revised into l u d v to be in conflict with only the common duo g u l u .
· · fu hu ku dv ev · · gu lu · · hu · · · dv · · av bv · · ku · · · · fu gu lu · · hu ku · · gu · · · bv · · ku dv · · av bv ev · · (d) The common duoc v d v is revised into k u d v to be in conflict with only the common duo h u k u  Figure 4.3d, the vertex k u d v connects the two gadget subgraphs.

Fig. 9
Four different configurations for joining the two gadget subgraphs for the vertices u, v ∈ V , in each of which a common duo is revised for the directed edge (u, v) ∈ E duo g u l u (h u k u , respectively). These four options of modification (Goldstein et al. 2004;Boria et al. 2014) are shown in Fig. 8. Since every vertex of V has at most two incoming edges and at most two outgoing edges, the revision process for the directed edge (u, v) ∈ E can be independently done with respect to all the other edges of E .
One can check (or refer to the detailed proofs in Goldstein et al. (2004); Boria et al. (2014)) that there exists an independent set of size α in G if and only if 4n + α duos can be preserved in I G , where n = |V |.
The above common duo modification process for each directed edge (u, v) ∈ E is equivalent to joining the two gadget subgraphs for the vertices u, v ∈ V by connecting one of g u l u and h u k u to one of a v b v and c v d v , but additionally revising the letter content of the common duo of I v . Corresponding to the four options of modification shown in Fig. 8, the two gadget subgraphs are joined as shown in Fig. 9, respectively.
Since each directed edge (u, v) ∈ E gives rise to exactly one of the four possible configurations shown in Fig. 9, we conclude from G being cubic that exactly three of the four degree-1 vertices {a u b u , c u d u , g u l u , h u k u } in the gadget subgraph for the vertex u ∈ V increase their degree to 2. It follows that however the edge orientation scheme is, all the vertices in the final graph G have degrees 1, 2, or 4. Therefore, it is impossible to determine in polynomial time which subset of all the degree-4 vertices is in the maximum independent set of G.

Conclusion
In this paper, we examined the 2-Max-Duo problem to design an improved approximation algorithm. Based on an existing linear reduction to the MIS problem (Goldstein et al. 2004;Boria et al. 2014), we presented a vertex-degree reduction scheme to reduce the maximum degree of the constructed instance graph from 6 to 4. Along the way, we uncovered several interesting structural properties of the constructed instance graph. Our main contribution is a (1.4 + )-approximation algorithm for 2-Max-Duo, for any > 0.
It is worth mentioning that our vertex-degree reduction technique can also be applied for k-Max-Duo when k ≥ 3. For example, we had worked out the details for k = 3, to reduce the maximum degree of the constructed instance graph from 12 to 10, leading to a (2.6 + )-approximation algorithm. Nevertheless, the (2.6 + )-approximation algorithm is superseded by the (2 + )-approximation algorithm for the general Max-Duo (Dudek et al. 2017).
For 2-Max-Duo, it would be interesting to investigate whether the maximum degree can be further reduced to 3, but not by determining in polynomial time which subset of all the degree-4 vertices is in the maximum independent set. On the other hand, one could examine whether certain structural properties of the 2-Max-Duo instance support a direct better-than-1.4 approximation algorithm, that is, not by calling the existing (1.4 + )-approximation algorithm for the MIS problem, or not even by reducing to the MIS problem.