Representing Paths in Digraphs
Abstract
In this contribution we consider two combinatorial problems related to graph string matching, motivated by recent approaches in computational genomics. Given a DAG where each node is labeled by a symbol, the problems aim to find a path in the DAG whose nodes contain all (or the maximum number of) symbols of the alphabet. We introduce a decision problem, -Representing Path, that asks whether there exists a path that contains all the symbols of the alphabet, and an optimization problem, called Maximum Representing Path, that asks for a path that contains the maximum number of symbols. We analyze the complexity of the problems, showing the NP-completeness of -Representing Path when each symbol labels at most three nodes in the DAG, and showing the APX-hardness of Maximum Representing Path when each symbol labels at most two nodes in the DAG. We complement the first result by giving a polynomial-time algorithm for -Representing Path when each symbol labels at most two nodes in the DAG. Then we investigate the parameterized complexity of the two problems when the DAG has a limited distance from a set of disjoint paths and we show that both problems are W[1]-hard for this parameter. We consider the approximation of Maximum Representing Path, giving an approximation algorithm of factor , where is the value of an optimal solution of the problem. We also show that Maximum Representing Path cannot be approximated within factor , for any constant , unless ( is the set of nodes of the DAG).
Keywords and phrases:
Graph String Matching, Computational Complexity, Parameterized Complexity, AlgorithmsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Parameterized complexity and exact algorithms ; Theory of computation Design and analysis of algorithms ; Theory of computation Graph algorithms analysisEditors:
Paola Bonizzoni and Veli MäkinenSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Graph string matching and approximate matching have been widely investigated due to their application in the context of pattern matching of hypertexts [1, 2, 20, 17] and panegenome analysis [19, 25, 5]. The problems consider a query string and a directed graph , whose nodes are labeled with symbols or strings. A path or a walk in is associated with a string that is obtained by concatenating the symbols or the strings on the path (or walk) nodes. The problems ask whether there exists a path or a walk that matches, or approximately matches, the query string. In exact matching the two strings have to be identical, while in approximated matching edit operations may be applied on the query string or on the node labels, in order to obtain two identical strings, and such edit operations have to be minimized.
Both matching and approximate matching can be solved in polynomial time if the input graph is a Directed Acyclic Graph (DAG) [17]. When the input digraph admits cycles the exact matching is solvable in polynomial time [1, 2, 20, 18, 13] and conditional lower bounds [8] show that improving the algorithms known in the literature is unlikely. The approximate matching is solvable in polynomial time when edit operations are applied only in the query string [13] and it is instead NP-hard when the edit operations may be applied on node labels [2], also for binary alphabet [13, 6].
A second research direction that has been recently investigated in sequence analysis, in the context of computational genomics, is the quest for subsequences of a given string that contain all symbols of the alphabet (and that have other specific properties) [21, 7, 15, 3, 16]. The property that each symbol of the alphabet is contained in a subsequence of a given string is applied in [21] looking for a run subsequence of maximum length, where a run subsequence contains at least one substring for each symbol of the alphabet. Another approach is considered in [15, 16] and it looks whether there exists a subsequence of that consists of substrings of repeated symbols that have length at least two, with the constraint that the subsequence contains each symbol of the alphabet. Both problems are NP-hard [21, 16] and results on their tractability have been given [21, 7, 3, 15, 16].
In this contribution, we introduce new problems that aim to integrate the two aforementioned approaches. On the one hand we consider a DAG having nodes labeled by symbols over an alphabet , since this allows us to represent variants of a sequence as in graph string matching. On the other hand we look for a path that contains all (or the maximum number of) symbols of , in a similar way to the second approach. If there exists a path in the DAG that contains all the symbols of the alphabet, we call this a representing path. We introduce a decision problem, called -Representing Path, where we ask whether there exists a representing path in a DAG . Since in some cases a representing path may not exist, we consider an optimization variant, called Maximum Representing Path, where we look for a path in that represents the maximum number of symbols in , that is the string associated with the path contains the maximum number of symbols of . Next, we summarize our results.
We start in Section 2 by presenting some concepts and by formally defining the two problems, -Representing Path and Maximum Representing Path. Then we investigate the complexity of the problems and we prove in Section 3 that -Representing Path is NP-complete even when each symbol labels at most three nodes of the input DAG, and Maximum Representing Path is APX-hard when each symbol labels at most two nodes of the DAG. Moreover, in both aforementioned cases, the input DAG has also the maximum degree bounded by three. In Section 3 we prove also a lower bound on the approximation: Maximum Representing Path cannot be approximated with factor , for any constant , unless , where is the set of nodes of the input DAG. We complement the first hardness result by showing in Section 4 that -Representing Path can be solved in polynomial time when each symbol labels at most two nodes of .
Then we study in Section 5 how the complexity of the two problems is influenced by the structure of . Observe that if consists of a set of disjoint paths (except for the source and the target nodes of ), then the two problems are easy to solve by inspecting each path independently. We show that -Representing Path and Maximum Representing Path are W[1]-hard for parameter distance to disjoint paths. Finally, in Section 6 we present an approximation algorithm for the Maximum Representing Path problem of factor , where is the value of an optimal solution of Maximum Representing Path. We conclude the paper in Section 7 pointing out some future directions. Some of the proofs are not included due to page limit (marked with ).
2 Preliminaries
We introduce some notation. For a natural number , we denote . A Directed Acyclic Graph (DAG) is a directed graph consisting of a set of nodes, a set of arcs, such that there is no directed cycle in . Given a finite alphabet and a DAG , we define a labeling function that associates a symbol with each node of , that is, .
A path in is a sequence ( represents the -th node of path ) of adjacent distinct nodes of , that is for and , for each with . The path is called a path, since it starts from and ends in . The set of symbols represented by is defined as follows:
A path in (labeled by ) is -representing if (an example is given in Fig. 1); we denote the nodes of the path as . When a path contains a node labeled by symbol , we say that covers . We denote the length of (the number of nodes in ) by . A longest path in a graph is a path having maximum length. Finding a longest path in a graph is an NP-hard problem and even hard to approximate [14], but in DAGs it can be computed in linear time [22]. Now, we define the first problem we study in this paper (we assume that the labeling is surjective, so each symbol in is associated with at least one node of the input graph ).
Problem 1.
(-Representing Path)
Input: A
DAG , with a source node ,
a target node ,
a labeling .
Output: Is there an -path in that is -representing?
Since there are cases where a DAG does not contain an -path in that is -representing (see the example in Fig. 1), we consider a second problem, where we look for an -path in that contains the maximum number of distinct symbols of .
Problem 2.
(Maximum Representing Path)
Input: A
DAG , with a source node ,
a target node ,
a labeling .
Output: An -path in that covers the maximum number of symbols in .
Given a DAG and a node , the degree of is the number of arcs incoming to or outgoing from , that is
The transitive closure of is a graph where is defined as follows:
Given two DAGs and , with , having source nodes , , respectively, and target nodes , , respectively, the concatenation of and is a DAG . Informally is obtained by adding an arc from to . The definition can be easily extended to more than two DAGs, specifying the order of concatenation. Note that the definition of concatenation holds also for paths.
3 Hardness
In this section we present hardness results for the -Representing Path and Maximum Representing Path problems. In Subsection 3.1 we show the NP-completeness of the -Representing Path problem even if when each symbol labels at most three nodes of , whose degree is bounded by three. In Subsection 3.2 we show two hardness of approximation results for the Maximum Representing Path problem: first we show that the problem is APX-hard even if each symbol labels at most three nodes of , whose degree is bounded by three; then we show a stronger result on arbitrary instances, namely that Maximum Representing Path cannot be approximated within a factor of , for any constant , unless .
3.1 NP-completeness
In this subsection we prove that the -Representing Path problem is NP-complete via a reduction from 3-SAT, a classical NP-complete problem (see, e.g., [11]). Our reduction holds even on restricted instances of the problems when each symbol labels at most three nodes of the input DAG and each node has degree bounded by three.
We recall that 3-SAT, given a formula in conjunctive normal form over a set of variables , where each clause is a disjunction of three literals (a variable or its negation), asks for an assignment of the variable set so that each clause of is satisfied.
Theorem 1.
The -Representing Path problem is NP-complete, even in the case when (1) each symbol labels at most three nodes of the DAG, that is, , and (2) the degree of each node is bounded by three.
Proof.
First, observe that the -Representing Path problem is in the class since, given an path, we can verify in polynomial time if the path contains each symbol of .
Given an instance of the 3-SAT problem, that is a Boolean formula with variables and clauses, denote the variables of the formula as and the clauses as . We construct an instance of -Representing Path, that is, a labeled DAG , as follows (see an example in Fig. 2). First, assume without loss of generality that no variable appears only negated or nonnegated – otherwise said, every variable has at least one negated and one nonnegated occurrence in the formula . If there exists such a variable, we can simply assign it to True (if it is nonnegated) or to False (if it is negated) and remove all the clauses that contain the respective variable.
-
The alphabet. For each clause , , we add a corresponding symbol to the alphabet . Moreover, for each variable , , we add symbols , to .
-
The node set. For each variable , , we add the following nodes in the node set. Assume that appears in clauses, times nonnegated and times negated. Thus, .
We first add to the node set two nodes: . Then, we add nodes and nodes . The source node node of is and the target node of is . As for the labeling, , ; assuming , , has the -th nonnegated appearance in clause then ; assuming , , has the -th negated appearance in clause then .
-
The arc set. For each we add the following arcs:
-
–
Arcs outgoing from : ,
-
–
Arcs incoming in : ,
-
–
Arcs to connect two subgraphs associated with and , where :
-
–
Arcs , with , and , with
-
–
We show now that our reduction is correct. More precisely, we show that (1) each symbol labels at most three nodes of , (2) the degree of is bounded by three and (3) the formula has a satisfying assignment, if and only if, there is an -path in that contains every symbol of .
Consider (1) and note that each symbol , , , labels exactly one node of . Moreover, since each clause consists of three literals, then each of the symbols , , labels exactly three nodes of .
Consider (2). The nodes of having degree larger than two, are possibly and , . Each has indegree at most one and outdegree two, since the arcs outgoing from are and , thus . Each has outdegree at most one and indegree two, since the arcs incoming to are and . Hence .
Now, we prove (3). First, given a satisfying assignment of , we select the path in as follows. For each , if is True, then in the subgraph corresponding to the variable , we take the path thus, covering the symbols , , and , associated with clauses where appears nonnegated; otherwise, if is assigned to False, we choose the path , thus covering the labels , , and , associated with clauses where appears negated. Between two subgraphs the path contains arc , .
Observe that we take the path that covers precisely the symbols corresponding to the clauses satisfied by . Observe that the symbols , that are associated with variables are always covered. Since the formula is satisfied by the assignment, the path obtained by concatenating the subpaths also covers all the symbols in the alphabet.
Conversely, assume that we are given an -path in that covers all the symbols in . If the -path contains the subpath , then we set to True, otherwise we set to False. The crucial observation that was also mentioned in the first part of the reduction is that the symbols covered in the subgraph associated with variable correspond to the clauses satisfied by the assignment to . Since all the labels are covered, the assignment produced satisfies all the clauses.
3.2 Hardness of Approximation
The previous result (Theorem 1) can be extended to Maximum Representing Path. By reducing from Max 2-SAT, which is known to be APX-hard [12], we show that Maximum Representing Path is APX-hard even when each symbol labels at most two nodes of the input DAG and the degree of each node is bounded by three.
Corollary 2 (∗).
The Maximum Representing Path problem is APX-hard, even in the case when (1) each symbol labels at most two nodes of the input DAG, that is, , and (2) the degree of each node is bounded by three.
We now present the second inapproximability result, namely an approximation preserving reduction from the Max -Cover problem (defined next), that cannot be approximated within factor , for any constant , unless .
Problem 3.
(Max -Cover)
Input: A collection of sets over a universe and an integer .
Output: sets from whose union have the largest cardinality.
Now, we present our second inapproximability result.
Theorem 3 (∗).
The Maximum Representing Path problem cannot be approximated within a factor of , for any constant , unless .
Proof.
(Sketch) We present an approximation preserving reduction from Max -Cover to Maximum Representing Path. Given an input instance of Max -Cover, we construct the following instance of Maximum Representing Path. First, the set of nodes of the DAG is defined as follows:
-
We add nodes , where and , labeled with the same symbol .
-
For each element , define a path of length , and each node of is labeled with a distinct symbol , .
-
Between two nodes , , , add paths, each one associated with a set (denoted by ); each path associated with consists of the concatenation of paths , where are the elements in set .
Given a solution of Max -Cover on instance such that , we can compute in polynomial time a solution of Maximum Representing Path on instance that covers symbols, by defining, between each two nodes and , , path . For the other direction, given a solution of Maximum Representing Path on instance that covers symbols, we can compute in polynomial time a solution of Max -Cover on instance , by defining as the set corresponding to the path between nodes and , .
4 A Polynomial Time Algorithm for -Representing Path when Each Symbol Labels at most Two Nodes
In this section we present a polynomial time exact algorithm (Algorithm 1) for the -Representing Path problem when each symbol in labels at most two nodes of the input DAG . We first introduce the notion of compatible nodes.
Definition 4.
We say that two nodes are compatible if there exists an -path that contains both nodes, that is either a path or . Otherwise, the nodes are called incompatible.
Algorithm 1 reduces the input instance of the -Representing Path problem when each symbol labels at most two nodes to an instance of 2-SAT, which is know to be polynomial time solvable [24, 4].
Theorem 5 (∗).
The -Representing Path problem is solvable in time in the case when each symbol labels at most two nodes of , that is, .
5 Distance from Disjoint Paths
The -Representing Path and the Maximum Representing Path problems are trivial if the DAG, after the removal of and , consists of a set of disjoint paths. Here we consider the two problems when parameterized by distance to disjoint paths and we show that -Representing Path and Maximum Representing Path are W[1]-hard for this parameter. The distance to disjoint paths is defined as the minimum number of nodes to be removed from a graph (in this case ) such that the resulting graph consists of a set of node disjoint paths (except possibly for and . We prove the hardness results by giving a parameterized reduction from the Multicolored Clique problem, defined as follows.
Problem 4.
(Multicolored Clique)
Input: An undirected graph , whose
nodes are partitioned into color classes and each edge
in connects two nodes that are in a different color class.
Output: Is there a clique in that contains exactly one
node for each color class?
A clique in that contains exactly one node for each color class is called a multicolored clique. Given an instance of the Multicolored Clique problem (recall that the partition of in color classes is given in input), we construct a corresponding instance of -Representing Path. The DAG consists of three subgraphs, that share some nodes (see Fig. 3 for a representation of the structure of ): (1) a DAG with source node and target node , (2) A DAG with source node and target node , and (3) A DAG with source node and target node .
We first give an informal description of the reduction and then we present it formally. and encode a multicolored clique (symbols not covered by a path in and represent nodes and edges of a multicolored clique), enables to cover all the symbols not selected in and (assuming a path selected in and contains all the symbols except those encoding a multicolored clique).
Consider , where , and , and , represents the -th node of color class (we assume that the nodes in each color class have some ordering). We start by giving an informal description of , and .
The DAG consists of concatenated DAGs , …, (see Fig. 4 for a sketch of ), where each , , is associated with a color class . Each , , encodes the selection of exactly one node of and it contains a path, denoted by , for each node , with . has a source node and a target node . Node has arcs to the first node of each path . Each contains the symbols associated with nodes in , except for the symbols associated with .
The DAG consists of concatenated DAGs , …, (see Fig. 5 for a sketch of ), where each , , , is associated with edges connecting nodes of color class and color class . DAG has a source node and a target node . For each edge , and , contains one path, denoted by whose nodes are labeled by one symbol associated with , and set (encoding edges between nodes of and of ) except for the symbol associated with edge .
The DAG (see Fig. 6) consists of concatenated DAGs , …, , each one associated with a color class , . Each , , has a source node and a target node . Node has arcs to paths, one path for each node , . The nodes in have labels that encode and the edges incident in . The idea is that the symbols not covered by a path in and can be covered by a path in only if the path in and contains all the symbols except those encoding edges of a multicolored clique in and one symbol for each node in the multicolored clique.
Now, we present the details of the reduction. We start by defining the alphabet and some subsets of .
We define sets , , and , , of symbols:
Given , with , we denote the subset of symbols as follows:
The set , , is defined as follows:
Note that and are different subsets, in particular that .
Now, we define the DAG . has a source node and a target node , both labeled by . is obtained by concatenating DAGs , , each one associated with a color class. Each has a source node and a target node . For each , there exists an arc from to (this defines the concatenation of subgraphs ). Now, we define each subgraph , :
-
Each subgraph has a source node and a target node , labeled by .
-
Node is connected to disjoint paths , each one associated with a node .
-
Each path is labeled by symbols
Finally, there is an arc from node to and an arc from node to node .
Next, we define the DAG . has a source node and a target node , both labeled by symbol . is obtained by concatenating DAGs , with and , each one associated with and . Each has a source node and a target node . We assume that DAGs are concatenated as follows: for with , if then there is an arc from the target of to the source of , and if (and ) there is an arc from the source of to the target of .
Now, we define each subgraph , and :
-
Each subgraph has a source node and a target node , labeled by .
-
Node is connected to paths , each one associated with a node and a node , such that .
-
Each path is labeled by the set of symbols (recall that , hence symbol does not label any node on each path ):
Finally, there is an arc from node to and an arc from node to node .
Now, we define the DAG . has source and target , both labeled by . is obtained by concatenating DAGs , , each one associated with a color class. Each subgraph , , is defined as follows:
-
Each subgraph has a source node and a target node , labeled by .
-
Between nodes and there are disjoint paths, , , each one associated with a node .
-
The nodes in each path are labeled by the following set of symbols
Note that the path contains nodes having labels that encode edges of incident in . Note also that these nodes, for each edge , have labels not .
Finally, there is an arc from to , with , an arc from (the source of ) to and an arc from to .
Having defined and its labeling, we start to prove some properties of graph and .
Lemma 6.
Consider an instance of Multicolored Clique and a corresponding instance of -Representing Path. Given a path from to in , then covers the following set of symbols:
-
1.
Each symbol ,
-
2.
A set defined as follows:
Proof.
Let be a path from to in . Nodes and , , must be traversed by any path in , hence also by , and they are labeled by . Hence point 1 holds.
We prove now point 2. Consider in particular a DAG , , and the subpath of in . By construction, traverses exactly one of , with , between and . Also note that by construction a subgraph of contains the symbols , except for the symbols , , that by construction do not label any node of , thus point 2 holds.
Lemma 7.
Consider an instance of Multicolored Clique and a corresponding instance of -Representing Path. A path in that covers all the symbols in contains a path from to in such that:
-
1.
For each traversed by in , traverses a subgraph , for some , and covers a set such that traverses path
-
2.
For each , covers a set of symbols, where and is defined as follows:
Proof.
Let be a path from to that covers all the symbols in . First, consider point 1. By Lemma 6, the path in does not cover the set of symbols , , , such that is traversed by . Each symbol , with , and , labels only nodes of and , thus if it is not covered by in it must be covered in . It follows that for each traversed by in , traverses a subgraph . This implies that covers
and point 1 is proven.
Now, we consider point 2. In each , and , path traverses exactly one subpath , for some and , whose nodes have labels . Then in covers a set of symbols, where and
hence point 2 is proven. Based on Lemma 6 and Lemma 7, we can prove the main result of this section.
Lemma 8 (∗).
Consider an instance of Multicolored Clique and a corresponding instance of -Representing Path. Then, contains a multicolored clique if and only if there exists a path in that covers all the symbols in .
has distance from a set of disjoint paths bounded by , since by removing the source and target node of each , with , of each , with and , and of each , with , we obtain a set of disjoint paths. Since Multicolored Clique is W[1]-hard when parameterized by [10], we can prove the following theorem.
Theorem 9 (∗).
-Representing Path is W[1]-hard when parameterized by distance to disjoint paths.
We extend the result of Theorem 9 to the Maximum Representing Path problem.
Corollary 10 (∗).
Maximum Representing Path is W[1]-hard when parameterized by distance to disjoint paths.
6 An Approximation Algorithm
In this section we present a polynomial time approximation algorithm for the Maximum Representing Path problem that achieves an approximation factor of , where is the number of distinct symbols in an optimal solution. Notice that, since , our algorithm is also a -approximation. Informally, the algorithm is as follows. First, we create a compatibility DAG , that is essentially the transitive closure of the input DAG (see Section 2 for the definition of transitive closure). Then, we consider a total order on , e.g., standard alphabetic order, such that we have , for two nodes and if the label of precedes the label of based on this order. We create two subgraphs of (that are implicitly DAGs), and as follows:
-
1.
where if and only if and
-
2.
where if and only if and .
Observe that and that the set of nodes of and is . We then compute , a longest path between and in , and , a longest path between and in . As pointed out in Section 2, a longest path in a DAG can be computed in linear time. The algorithm (formally presented in Algorithm 2) outputs the path , , that has a largest number of distinct symbols. Theorem 11 proves that it is indeed a -approximation for Maximum Representing Path.
Theorem 11.
Algorithm 2 is a -approximation for the Maximum Representing Path problem and requires time.
Proof.
First, we prove that Algorithm 2 computes a feasible solution of Maximum Representing Path, that is, the path returned by the Algorithm 2 is an -path in D. Consider the DAG . Since is the transitive closure of , it follows that for any path in there exists a corresponding path in , since for any arc , there is a path from to in . Thus, given a path in , we can compute a corresponding path in by concatenating the paths associated with arcs in . Since and are subgraphs of , in particular they have the same set of nodes and a subset of the arcs of , any path in or corresponds to a path in , hence also in . Moreover, we add (, respectively) to the returned path if (or , respectively) is not part of the path , and possibly a path from to the first node of (a path from the last node of to , respectively), hence the algorithm returns an path in .
We show now the approximation factor of Algorithm 2. Let be an optimal solution of the Maximum Representing Path problem, that is, a path such that is maximized. Let be a subset of the nodes that appear on the path in this order and have pairwise distinct labels, that is
According to the Erdös-Szekeres theorem [9], every sequence of distinct integers contains a monotonic (increasing or decreasing) sequence of length . Thus, the string associated with contains a a monotonic (increasing or decreasing) sequence of length at least , hence the path contains nodes , such that appears before in , for and , and either or . Notice that since is a path in , for any two nodes such that , we have . Thus, the path is either a path in or in , thus it has length not larger than that of or of , respectively. Since the path returned by Algorithm 2 is obtained by taking the nodes in one of that covers the maximum number of symbols (and possibly adding other nodes), and each , , contains nodes with distinct labels, then . Thus, Algorithm 2 is a -approximation algorithm for Maximum Representing Path.
We consider now the time complexity of Algorithm 2. Step 1 can be computed in time [23] and contains arcs. Step 2 and Step 3, can be computed in time by traversing . Step 4 and 5 can be computed in and time [26], respectively, where and are bounded by . Thus the overall time complexity is and, since we can assume that no node is isolated, then , thus the overall time complexity is .
7 Conclusion
In this contribution we have introduced two combinatorial problems (-Representing Path and Maximum Representing Path) that ask to identify a path in a node labeled DAG that contains all (or a subset of maximum size) of the alphabet symbols. We have proved results on the computational complexity and parameterized complexity of the two problems, and we have studied the approximation of Maximum Representing Path.
One of the most interesting future directions is to further investigate the approximate complexity of the Maximum Representing Path problem: does it admit constant factor approximation algorithms? It is interesting to design approximation algorithms for some restrictions on the DAG structure, for example when the degree is bounded. It is also interesting to study other structural properties of the DAG that may lead to polynomial-time algorithms for both problems.
References
- [1] Tatsuya Akutsu. A linear time pattern matching algorithm between a string and a tree. In Combinatorial Pattern Matching, 4th Annual Symposium, CPM 93, Padova, Italy, June 2-4, 1993, Proceedings, pages 1–10, 1993. doi:10.1007/BFb0029792.
- [2] Amihood Amir, Moshe Lewenstein, and Noa Lewenstein. Pattern matching in hypertext. Journal of Algorithms, 35(1):82–99, 2000. doi:10.1006/jagm.1999.1063.
- [3] Yuichi Asahiro, Hiroshi Eto, Mingyang Gong, Jesper Jansson, Guohui Lin, Eiji Miyano, Hirotaka Ono, and Shunichi Tanaka. Approximation algorithms for the longest run subsequence problem. In Laurent Bulteau and Zsuzsanna Lipták, editors, 34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, June 26-28, 2023, Marne-la-Vallée, France, volume 259 of LIPIcs, pages 2:1–2:12. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.CPM.2023.2.
- [4] Bengt Aspvall, Michael F. Plass, and Robert Endre Tarjan. A linear-time algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett., 8(3):121–123, 1979. doi:10.1016/0020-0190(79)90002-4.
- [5] Jasmijn A. Baaijens, Paola Bonizzoni, Christina Boucher, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, and Jouni Sirén. Computational graph pangenomics: a tutorial on data structures and their applications. Nat. Comput., 21(1):81–108, 2022. doi:10.1007/S11047-022-09882-6.
- [6] Riccardo Dondi, Giancarlo Mauri, and Italo Zoppis. On the complexity of approximately matching a string to a directed graph. Inf. Comput., 288:104748, 2022. doi:10.1016/J.IC.2021.104748.
- [7] Riccardo Dondi and Florian Sikora. The longest run subsequence problem: Further complexity results. In Pawel Gawrychowski and Tatiana Starikovskaya, editors, 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021, July 5-7, 2021, Wrocław, Poland, volume 191 of LIPIcs, pages 14:1–14:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.CPM.2021.14.
- [8] Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, and Roberto Grossi. On the complexity of string matching for graphs. ACM Trans. Algorithms, 19(3):21:1–21:25, 2023. doi:10.1145/3588334.
- [9] Paul Erdös and George Szekeres. A combinatorial problem in geometry. Compositio mathematica, 2:463–470, 1935.
- [10] Michael R. Fellows, Danny Hermelin, Frances A. Rosamond, and Stéphane Vialette. On the parameterized complexity of multiple-interval graph problems. Theor. Comput. Sci., 410(1):53–61, 2009. doi:10.1016/J.TCS.2008.09.065.
- [11] Michael R Garey and David S Johnson. Computers and intractability, volume 174. freeman San Francisco, 1979.
- [12] Johan Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001. doi:10.1145/502090.502098.
- [13] Chirag Jain, Haowen Zhang, Yu Gao, and Srinivas Aluru. On the complexity of sequence to graph alignment. In Lenore J. Cowen, editor, Research in Computational Molecular Biology - 23rd Annual International Conference, RECOMB 2019, Washington, DC, USA, May 5-8, 2019, Proceedings, volume 11467 of Lecture Notes in Computer Science, pages 85–100. Springer, 2019. doi:10.1007/978-3-030-17083-7_6.
- [14] David R. Karger, Rajeev Motwani, and G. D. S. Ramkumar. On approximating the longest path in a graph. Algorithmica, 18(1):82–98, 1997. doi:10.1007/BF02523689.
- [15] Manuel Lafond, Wenfeng Lai, Adiesha Liyanage, and Binhai Zhu. The longest subsequence-repeated subsequence problem. In Weili Wu and Jianxiong Guo, editors, Combinatorial Optimization and Applications - 17th International Conference, COCOA 2023, Hawaii, HI, USA, December 15-17, 2023, Proceedings, Part I, volume 14461 of Lecture Notes in Computer Science, pages 446–458. Springer, 2023. doi:10.1007/978-3-031-49611-0_32.
- [16] Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, and Peng Zou. The longest letter-duplicated subsequence and related problems. Acta Informatica, 61(3):315–329, 2024. doi:10.1007/S00236-024-00459-7.
- [17] Udi Manber and Sun Wu. Approximate string matching with arbitrary cost for text and hypertext. In Advances in Structural and Syntactic Pattern Recognition, pages 22–33, 1992. doi:10.1142/9789812797919_0002.
- [18] Gonzalo Navarro. Improved approximate pattern matching on hypertext. Theoretical Computer Science, 237(1-2):455–463, 2000. doi:10.1016/S0304-3975(99)00333-3.
- [19] Ngan Nguyen, Glenn Hickey, Daniel R. Zerbino, Brian J. Raney, Dent Earl, Joel Armstrong, W. James Kent, David Haussler, and Benedict Paten. Building a pan-genome reference for a population. Journal of Computational Biology, 22(5):387–401, 2015. doi:10.1089/cmb.2014.0146.
- [20] Kunsoo Park and Dong Kyue Kim. String matching in hypertext. In Zvi Galil and Esko Ukkonen, editors, Combinatorial Pattern Matching, 6th Annual Symposium, CPM 95, Espoo, Finland, July 5-7, 1995, Proceedings, volume 937 of Lecture Notes in Computer Science, pages 318–329. Springer, 1995. doi:10.1007/3-540-60044-2_51.
- [21] Sven Schrinner, Manish Goel, Michael Wulfert, Philipp Spohr, Korbinian Schneeberger, and Gunnar W. Klau. Using the longest run subsequence problem within homology-based scaffolding. Algorithms Mol. Biol., 16(1):11, 2021. doi:10.1186/S13015-021-00191-8.
- [22] Robert Sedgewick and Kevin Wayne. Algorithms (Fourth edition deluxe). Addison-Wesley, 2016.
- [23] Steven Skiena. The Algorithm Design Manual, Third Edition. Texts in Computer Science. Springer, 2020. doi:10.1007/978-3-030-54256-6.
- [24] Robert Endre Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1(2):146–160, 1972. doi:10.1137/0201010.
- [25] The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118–135, 2018. doi:10.1093/bib/bbw089.
- [26] David P. Williamson and David B. Shmoys. The Design of Approximation Algorithms. Cambridge University Press, 2011. URL: http://www.cambridge.org/de/knowledge/isbn/item5759340/?site_locale=de_DE.
