Order-Preserving Pattern Matching Indeterminate Strings

Given an indeterminate string pattern $p$ and an indeterminate string text $t$, the problem of order-preserving pattern matching with character uncertainties ($\mu$OPPM) is to find all substrings of $t$ that satisfy one of the possible orderings defined by $p$. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. This paper provides the first polynomial algorithm to answer the $\mu$OPPM problem when indetermination is observed on the pattern or text. Given two strings with length $m$ and $O(r)$ uncertain characters per string position, we show that the $\mu$OPPM problem can be solved in $O(mr\lg r)$ time when one string is indeterminate and $r\in\mathbb{N}^+$. Mappings into satisfiability problems are provided when indetermination is observed on both the pattern and the text, and results concerning the general problem complexity are presented as well, with $\mu$OPPM problem proved to be NP-hard in general.


Introduction
Given a pattern string p and a text string t, the exact order preserving pattern matching (OPPM) problem is to find all substrings of t with the same relative orders as p. The problem is applicable to strings with characters drawn from numeric or ordinal alphabets. Illustrating, given p= (1,5,3,3) and t = (5, 1, 4, 2, 2, 5, 2, 4), substring t[1..4] = (1, 4, 2, 2) is reported since it satisfies the character orders in p, p[0] ≤ p [2] = p [3] ≤ p [1]. Despite its relevance, the OPPM problem has limited potential since it prevents the specification of errors, uncertainties or don't care characters within the text.
Order-preserving pattern matching captures the structural isomorphism of strings, therefore having a wide-range of relevant applications in the analysis of financial times series, musical sheets, physiological signals and biological sequences [1,2,3]. Uncertainties often occur across these domains. In this context, although the OPPM problem is already a relaxation of the traditional pattern matching problem, the need to further handle localized errors is essential to deal with noisy strings [4]. For instance, given the stochasticity of gene regulation (or markets), the discovery of order-preserving patterns in gene expression (or financial) time series needs to account for uncertainties [5,6]. Numerical indexes of amino-acids (representing physiochemical and biochemical properties) are subjected to errors difficulting the analysis of protein sequences [7]. Another example are ordinal strings obtained from the discretization of numerical strings, often having two uncertain characters in positions where the original values are near a discretization boundary [4].
Let m and n be the length of the pattern p and text t, respectively. The exact OPPM problem has a linear solution on the text length O(n + m lg m) based on the Knuth-Morris-Pratt algorithm [8,2,9]. Alternative algorithms for the OPPM problem have also been proposed [10,11,12]. Contrasting with the large attention given to the resolution of the OPPM problem, to our knowledge there are no polynomial-time algorithms to solve the µOPPM problem. Naive algorithms for µOPPM assess all possible pattern and text assignments, bounded by O(nr m ) when considering up to r uncertain characters per position.
This work proposes the first polynomial time algorithms able to answer the µOPPM problem. Accordingly, the contributions are organized as follows. First, we show that an indeterminate string of length m order-preserving matches a determinate string with the same length in O(mr lg r) time based on their monotonic properties. Second, and given two indeterminate strings with the same size, we provide a linear encoding of the µOPPM into a satisfiability formula with properties of interest. Furthermore, we extend this encoding and we present results concerning the computational complexity of µOPPM problem variations, namely a proof of that the µOPPM problem is NP-hard in general. Third, given a pattern and text strings with lengths m and n, only one of them indeterminate, we show that the µOPPM problem can be solved in linear space and its average efficiency boosted under effective filtration procedures.
A preliminary version of this work was presented at the Annual Symposium on Combinatorial Pattern Matching (CPM) [13]. In this paper, we revise previous results and we present new results concerning the computational complexity of µOPPM problem; Sections 3.3, 3.4 and 5 are new.

Background
Let Σ be a totally ordered alphabet and an element of Σ * be a string. The length of a string w is denoted by |w|. The empty string ε is a string of length 0. For a string w = xyz, x, y and z are called a prefix, substring, and suffix of w, respectively. The i-th character of a string w is denoted by w[i] for each 0 ≤ i < |w|. For a string w and integers 0 ≤ i ≤ j < |w|, w[i..j] denotes the substring of w from position i to position j. For convenience, let w[i..j] = ε when i > j.
Given strings x and y with equal length m, y is said to order-preserving against x [8], denoted by x ≈ y, if the orders between the characters of x and y are the same, i.e.
for any 0 ≤ i, j < m. A nonempty pattern string p is said to order-preserving match (op-match in short) a non-empty text string t if and only if there is a position i in t such that The order-preserving pattern matching (OPPM) problem is to find all such text positions.

The Problem
Given a totally ordered alphabet Σ, an indeterminate string is a sequence of disjunctive sets of characters Given an indeterminate string x, a valid assignment $x is a (determinate) string with a single character at position i, denoted $ Given a determinate string x of length m, an indeterminate string y of equal length is said to be order-preserving against x, identically denoted by x ≈ y, if there is a valid assignment $y such that the relative orders of the characters in x and $y are the same, i.e.
Given two indeterminate strings x and y with length m, y preserves the orders of x, x ≈ y, if exists $y in y that respects the orders of a valid assignment $x in x.
A non-empty indeterminate pattern string p is said to order-preserving match (op-match in short) a non-empty indeterminate text string t if and only if there is a position i in t such that p ≈ t[i − |p| + 1..i]. The problem of order-preserving pattern matching with character uncertainties (µOPPM) problem is to find all such text positions.
To understand the complexity of the µOPPM problem, let us look to its solution from a naive stance yet considering state-of-the-art OPPM principles. The algorithmic proposal by Kubica et al. [8] is still up to this date the one providing a lowest bound, O(n+q), where q = m for alphabets of size m O(1) (q = m lg m otherwise). Given a determinate string x of length m, an integer i (0 ≤ i < m) is said in the context of this work to be an order-preserving border In this context, given a pattern string p, the orders between the characters of p are used to linearly infer the order borders. The order borders can then be used within the Knuth-Morris-Pratt algorithm to find op-matches against a text string t in linear time [8].
Given a determinate string p of length m and an indeterminate string t of length n, the previous approach is a direct candidate to the µOPPM problem by decomposing t in all its possible assignments, O(r n ). Since determinate assignments to t are only relevant in the context of m-length windows, this approach can be improved to guarantee a maximum of O(r m ) assignments at each text position. Despite its simplicity, this solution is bounded by O(nr m ). This complexity is further increased when indetermination is also considered in the pattern, stressing the need for more efficient alternatives. .

Related work
The exact OPPM problem is well-studied in literature. Kubica et al. [8], Kim et al. [2] and Cho et al. [9] presented linear time solutions on the text length by respectively combining order-borders, rank-based prefixes and grammars with the Knuth-Morris-Pratt (KMP) algorithm [14]. Cho et al. [10], Belazzougui et al. [11], and Chhabra et al. [12] presented O(nm) algorithms that show a sublinear average complexity by either combining bad character heuristics with the Boyer-Moore algorithm [15] or applying filtration strategies. Recently, Chhabra et al. [16] proposed further principles to solve OPPM using word-size packed string matching instructions to enhance efficiency.
In the context of numeric strings, multiple relaxations to the exact pattern matching problem have been pursued to guarantee that approximate matches are retrieved. In norm matching [17,18,19,20], matches between numeric strings occur if a given distance threshold f (x, y) ≤ θ is satisfied. In (δ,γ)matching [21,22,23,24,25,26,27], strings are matched if the maximum difference of the corresponding characters is at most δ and the sum of differences is at most γ.
Despite the relevance of the aforementioned contributions to answer the exact order-preserving pattern matching and generic pattern matching, they cannot be straightforwardly extended to efficiently answer the µOPPM problem.

On solving µOPPM
Section 3.1 introduces the first efficient algorithm to solve the µOPPM problem when one string is indeterminate (r ∈ N + ). Section 3.2 discusses the existence of efficient solvers when both strings are indeterminate. Section 3.3 introduces then a polynomial time algorithm for the Alternate-µOPPM as a subproblem of µOPPM where both strings may have indeterminate characters, but never in the same position. Given the formulations proposed in Section 3.2, we hypothesize that op-matching indeterminate strings with an arbitrary number of uncertain characters per position (r ∈ N + ) is in class NPC. Furthermore, we show in Section 3.4 that the problem {3,3}-µOPPM, defined as the subproblem of µOPPM where both the pattern and the text have indeterminate characters in any position (although at least one position must have at least three indeterminate characters in both pattern and text), is NP-hard. We still leave a gap in between these two groups, namely for the strings where there are at most two indeterminate characters in both strings at the same position. It remains open whether or not this problem is NP-hard.
Proof. (⇒) If the length of the longest increasing subsequence (LIS), |w|, equals the number of monotonic relations in x, |y ′ π |, then y ≈ x. By sorting characters in descending order per position, we guarantee that at most one character per position in y ′ π appears in the LIS (respecting monotonic orders in x given y ′ π properties). By intersecting characters in positions of y with identical characters in x, we guarantee the eligibility of characters satisfying equality orders in x, otherwise empty positions in y ′ π are observed and the LIS length is less than there is no assignment in y that op-matches x due to one of two reasons: 1) there are empty positions in y ′ π due to the inability to satisfy equalities in x, or 2) it is not possible to find a monotonically increasing assignment to y ′ π and, given the properties of y ′ π , y π cannot preserve the orders of x π .
Solving the LIS task on a string of size n is O(n lg n) [42] where n = |z| = O(rm). In addition, set intersection operations are performed O(m) times on sets with O(r) size, which can be accomplished in O(rm lg r) time. As a result, the µOPPM problem with one indeterminate string can be solved in O(rm lg(rm)).
Given the fact that the candidate string for the LIS task has properties of interest, we can improve the complexity of this calculus (Theorem 3.2) in accordance with Algorithm 1.
Algorithm 1: O(mr lg r) µOPPM algorithm with one indeterminate string.
Proof. In accordance with Algorithm 1, µOPPM is bounded by the verification of equalities, O(mr lg r) [43]. Testing inequalities after set intersections can be linearly performed on the size of y, O(mr) time, improving the O(mr lg(mr)) bound given by the LIS calculus.
The analysis of Algorthim 1 further reveals that the µOPPM problem with one indeterminate string requires linear space in the text length, O(mr).

µOPPM with indeterminate pattern and text
As indetermination in real-world strings is typically observed between pairs of characters [4], a key question is whether µOPPM on two indeterminate strings is in class P when r = 2. To explore this possibility, new concepts need to be introduced. In OPPM research, character orders in a determinate string of length m can be decomposed in 3 sequences with m unit sets: Leq, Lmax and Lmin capture =, > and < relationships between each character x[i] in x and the closest preceding character x[k]. These orders can be inferred in linear time for alphabets of size m O(1) and in O(m lg m) time for other alphabets by answering the "all nearest smaller values" task on the sorted indexes [8]. Figure 1 depicts Leq, Lmax and Lmin for x = (1, 4, 3, 1). Given determinate strings x and y, When allowing uncertainties between pairs of characters, previous research on the OPPM problem cannot be straightforwardly extended due to the need to trace O(2 m ) assignments on indeterminate strings.

Lemma 3.4. Given a determinate string x, an indeterminate string y, and the singleton sets
Proof. (⇒) In accordance with Leq, Lmax and Lmin definition, for any a ∈ A, b ∈ B and c ∈ C we have ; and by the assumption of the lemma, , yielding the stated equivalence.
Given two strings of equal length, the µOPPM problem can be schematically represented according to the identified order restrictions. Figure 2 represents restrictions on the indeterminate string y = (2, 4|5, 3|5, 1|2) in accordance with the observed orders in x = (1, 4, 3, 1). The left side edges are placed in accordance with Lemma 3.4 and capture assessments on the orders between pairs of characters. The right side edges capture incompatibilities detected after the assessments, i.e. pairs of characters that cannot be selected simultaneously (for instance, y[0] = 2 and y[3] = 1, or y[1] = 4 and y[2] = 5). For the given example, there are two valid assignments, $y 1 = (2, 4, 3, 2) and $y 2 = (2, 5, 3, 2), that satisfy To verify whether there is an assignment that satisfies the identified ordering restrictions, we propose the reduction of µOPPM problem to a Boolean satisfiability problem.
Given a set of Boolean variables, a formula in conjunctive normal form is a conjunction of clauses, where each clause is a disjunction of literals, and a literal corresponds to a variable or its negation. Let a 2CNF formula be a formula in the conjunctive normal form with at most two literals per clause. Given a CNF formula, the satisfiability (SAT) problem is to verify if there is an assigning of values to the Boolean variables such that the CNF formula is satisfied.   formula: If the established φ formula is satisfiable, there is a Boolean assignment to the variables that specify an assignment of characters in y, $y, preserving the orders of x (as defined by Leq, Lmax and Lmin). Otherwise, it is not possible to select an assignment $y op-matching x. φ has at most r × m variables, .m − 1}, σ ∈ Σ}. The Boolean value assigned to a variable z i,σ simply defines that the associated character σ from y[i] can be either considered (when true) or not (when false) to compose a valid assignment $y that opmatches the given determinate string x. The reduced formula in (1)  . Clauses of the first type specify the need to select at least one character per position in y to guarantee the presence of valid assignments. The remaining clauses specify ordering constraints between characters. If an inequality, such as $y[i] > $y[j], is assessed as true, the associated clause is removed. Otherwise, (¬z i,σ1 ∨ ¬z j,σ2 ) is derived, meaning that these σ 1 and σ 2 characters should not be selected simultaneously since they do not satisfy the orders defined by a given pattern. For instance, the pairs of characters in orange from Figure 2 should not be simultaneously selected due to order conflicts. To this end, (¬z 0,2 ∨ ¬z 3,1 ) and (¬z 1,4 ∨ ¬z 2,5 ) clauses need to be included to verify if y ≈ x. Considering y = (2, 4|5, 4|5, 1|2) and x = (1, 4, 3, 1), schematically represented in Figure 2, the associated CNF formula is:  Proof. Given the fact that a 2SAT problem can be solved in linear time [44] 1 , this proof directly derives from Theorem 3.6 as it guarantees the soundness of reducing µOPPM (r = 2) to a 2SAT problem with a CNF formula with O(m) size.
As the size of the mapped CNF formula φ is O(m) and the a valid algorithm to verify its satisfiability would require the construction of a graph with O(m) nodes and edges, the required memory for the target µOPPM problem is Θ(m).
When moving from one to two indeterminate strings, previous contributions are insufficient to answer the µOPPM problem. In this context, the Leq, Lmax and Lmin vectors need to be redefined to be inferred from an indeterminate string:

($y[t + 1] = $y[a] ∧ $y[t + 1] > $y[b] ∧ $y[t + 1] < $y[c])
Proof. (⇒) Similar to the proof of Lemma 3.4, yet A, B and C conditional to x[t + 1] (Definition 3.3) are now given by A j , B j and C j conditional to x j [t + 1] (Definition 3.8). If there is an assignment to y[1..t + 1] in §y that preserves one 1 2SAT problems have linear time and space solutions on the size of the input formula. Consider for instance the original proposal [44], the formula φ is modeled by a directed graph G = (V, E), with two nodes per variable z i in φ (z i and ¬z i ) and two directed edges for each clause z i ∨ z j (the equivalent implicative forms ¬z i ⇒ z j and ¬z j ⇒ z i ). Given G, the strongly connected components (SCCs) of G can be discovered in O(|V | + |E|). During the traversal if a variable and its complement belong to the same SCC, then the procedure stops as φ is determined to be unsatisfiable. Given the fact that both |V | = O(m) and |E| = O(m) by Lemma 3.6, this procedure is O(m) time and space. x To verify whether there is an assignment that satisfies the identified ordering restrictions, Theorem 3.11 extends the previously introduced SAT mapping given by (1).
Proof. If x ≈ y then φ is satisfiable, and if x does not op-match y then φ is not satisfiable. When x does not op-match y, there is no assignment of values $x ∈ x and $y ∈ y such that $x ≈ $y. Per formulation, in the absence of an order-preserving match, conflicts will prevent the assignment of at least one variable z i,$x[i],$y[i] per i th position, thus making φ formula unsat.
If the formula in (2) is satisfiable, there is a Boolean assignment to the variables such that there is an assignment of characters in y, $y, and in x, $x, such that both strings op-match. Otherwise, it is not possible to select assignments such that x ≈ y. Given r = 2, the established φ formula has at most 4m variables, {z i,σ1,σ2 | i ∈ {0 . . . m − 1}, σ 1 , σ 2 ∈ Σ}. The Boolean values assigned to these variables define whether characters σ 1 ∈ x[i] and σ 2 ∈ y[i] belong to an op-match. The reduced formula is composed of two major types of clauses: • Those in the first line of (2) ∨ ¬z j,σ2 ), meaning that these characters should not be selected simultaneously in the given positions (see Figure 4).
Proof. The reduced formula in (2) is in the two conjunctive normal form (CNF) with at most 4m clauses in the first line of (2) and a maximum of O(mr) orders per position (Remark 3.9), totalling at most O((mr) 2 ) order conflicts between characters, from the restriction clauses in the reammining of (2).
Although we are no longer in the conditions of Theorem 3.7, namely because the above satisfiability formulation is not a 2SAT instance, given its unique properties, effective backtracking in accordance with the clauses in the first line of (2), as well as dedicated conflict pruning principles derived from reamining clauses in (2), can be considered to develop efficient SAT solvers able to solve the µOPPM problem. And, as we will show later, we are not expected to do much better.

Polynomial time Alternate-µOPPM
In this section, we define Alternate-µOPPM as the subproblem of µOPPM where both strings (x and y, interchangeable) may have indeterminate characters, but never in the same position; we show that Alternate-µOPPM is polynomial in both the number of indeterminacies (r, which may be different in each position and string) and length of the strings (m). To do this, we will present a set of 2SAT clauses, in the form of implications, that can represent every constraint of this problem. We will first assume that there are no repeated characters within each string and then extend the reduction to handle equalities.
Given a string x and position i, we represent the set of indeterminate characters x[i] as the ascending sequence a 0 |...|a ri−1 where ∀ j a j ∈ x[i] and |x[i]| = r i . We will use only r when the context leads to no ambiguities, or to mean the largest possible r i . All of our 2SAT variables will be of the form g aj , meaning that the chosen value $x[i] is greater than or equal to a j .
Consistency clauses. Here, we describe the clauses that maintain consistency between all the g variables for individual positions. We only need to specify that, if we have chosen a value greater than a i , we have also chosen a value greater than a i−1 , the value immediately below it, i.e., This leads to a single clause per indeterminacy, per position, for both pattern and text, and so, at most, 2mr = O(mr) clauses.
Order clauses (Type 1). Here, we describe the clauses enforcing the order relation between each pair of positions. Given two strings x and y, for positions α and β, if $x[α] > $x[β], then $y[α] > $y[β] (and the same for the < relation).
This first set of clauses applies to Type 1 (see Table 1). We only need to find the index (in each string) that separates the cases where $x[α] > $x[β] from the cases where $x[α] < $x[β] and add a single constraint expressing it.
Let i be the lowest index such that b i > a and j the lowest index such that a j ≥ b, where a and b are as in Table 1. Then, we have g bi =⇒ ¬g aj , ¬g bi =⇒ g aj .
This leads to two clauses for every pair of positions, and so, O(m 2 ) clauses.
Order clauses (Type 2). Finally, we have a second set of clauses that applies to Type 2 (see Table 2). Here, we have the order between α and β fixed already by whichever string x or y has no indeterminacies.
If a > b, for every index i indexing b i , and let j be the lowest index such that a j > b i . Then we add g bi =⇒ g aj .
If there is no such j, we add instead Similarly, if a < b, for every index i indexing a i , let j be the lowest index such that a i < b j . Then we add If there is no such j, we add instead ¬g ai .
This leads to at most r clauses for every pair of positions, and so O(rm 2 ) clauses. Because character order is a transitive property, this type of clauses may be reduced to O(rm) using a similar notion to the Lmax and Lmin sets introduced in Section 3.2 to consider only "adjacent" (taking adjacent to mean the closest position of the same type) pairs of positions, instead of every pair.
Forcing choice. With the clauses specified above, we can find coherent solutions to the problem. However, it is possible to satisfy the formula by assigning all possible values for a given variable to false (effectively skipping the position). This has a straightforward solution, given the chosen encoding of the variables. Each 2SAT variable represents a greater or equal value in the corresponding OPPM position, the variable corresponding to the lowest value for each position is trivially true, letting us force a value choice with a single added variable. For every position, with variables g 0 , ...g ri , we add the clause g 0 , forcing it to be true to satisfy the 2SAT formula.
Extracting solutions. Finally, we need to extract the solution to the OPPM problem from the 2SAT solution. This is easily done in linear time by sweeping every variable in ascending order, in each position. In each position, with variables g 0 , ..., g ri , we find the variable at index j such that g j is true and g j+1 is false. The chosen value in the OPPM problem, for the given position, is the value at index j.
Dealing with equalities. We now turn to cases where characters match and show how to adapt the encoding above to equalities. Let us consider Type II equalities, first, where a = b. The easy solution to this is the same as the one presented before. We preprocess the two strings by grouping all the repeats into a single position and intersecting their indeterminacies. For Type I equalities, we need to add 4 clauses to each pair. Let i, j be indexes such that a = b i and b = a j . We add If only i exists (or j), we simply remove b i (or a j ) from the input, as such an assignment could never lead to a valid solution.
Pair incompatibility. All the clauses described above serve to maintain consistency between pairs. It may happen that a given pair is unsatisfiable by itself, and no clauses would be constructed. These cases can be dealt separately, as pre-processing. If we find a pair that can not be satisfied, we can terminate the program before ending the construction, since there is no solution to the OPPM instance. Proof. Property resulting from the encoding above and, as in the proof of Theorem 3.7, given the fact that a 2SAT problem can be solved in linear time [44].

µOPPM with 3 indeterminacies in both text and pattern is NP-hard
In this section, we define {3, 3}-µOPPM as the subproblem of µOPPM where both the pattern and the text have indeterminate characters in any position (although at least one position must have at least three indeterminate characters in both pattern and text) and prove it NP-hard (thus proving the same for general µOPPM). We do this with a direct reduction from 3CNF-SAT, first presenting the construction and then the proof of equivalence between the two instances. The construction is similar to the one by Bose et al. for the permutation matching problem [45].

Construction.
To ease the description of the construction itself, we start by describing how we represent an instance of 3CNF-SAT. First, we assume that every literal and clause has some ordering. We have a set V of literals, and a set C of clauses. Each clause c is represented by two tuples, (z c,0 , z c,1 , z c,2 ) and (l c,0 , l c,1 , l c,2 ). z c,i ∈ {0, . . . , |V | − 1} represents the index of literal i of clause c; l c,i ∈ {0, 1} represents the value of the literal i in clause c, having the value of 0 for positive literals and 1 for negative literals. For example, the clause (v 1 ∨ ¬v 2 , ∨v 5 ) would be represented by the two tuples z = (1, 2, 5) and l = (0, 1, 0).
Although the designations of text or pattern are interchangeable in this section, we will use pattern for the simpler string (with less indeterminacies) and text for the more complicated string (with more indeterminacies). We use p and t for the pattern and text, respectively, or s when they are interchangeable.
Both text and pattern have two parts, one representing literals and the other representing clauses. Each literal, and clause, has a single position in each string to represent it, dividing s into s V = s[0..|V |−1] and s C = s[|V |..|V |+|C|−1]. In p V , we have a simple sequence of literals given by their indexes, so p[i] = i+1, for i ∈ {0, . . . , |V |−1}; in t V we have a similar sequence, but each literal takes one of two variable values to represent an assignment of true or false, so t[i] = 2×(i+1) or 2 × (i + 1) − 1. We choose the larger value to represent the assignment of true. In s C , each position has three indeterminacies, corresponding to the three variables of the clause. In p C , we choose one of the three literals of the respective clause. For clause c, with literals v 1 , v 2 , v 5 (regardless of their value being positive or negative), its position in p, p[|V | + c] = 1|2|5. In t C , as in p C we choose one of the literals, but now the value of the literal must satisfy the clause. For clause c, 2|3|10. An example of this construction is shown in Table 3.
Lemma 3.14. The construction above takes polynomial time.
Proof. It is easy to see that, assuming that variables and clauses are numbered, we can simply scan the formula once to construct our two strings in linear time.

Lemma 3.15. The initial 3CNF-SAT clause is satisfiable if and only if there is an order-isomorphic match between the two constructed strings.
Proof. We start by showing how solving the µOPPM instance solves the initial 3CNF-SAT instance. To solve µOPPM, we need to choose exactly one value for each position in p and t that leads to two order-isomorphic strings. To extract the solution, we can limit ourselves to look at the initial part of t, t[0, |V | − 1], which sets the value of each literal. First, note that p function is to maintain consistency between the values of literals chosen in t. By choosing only literals in p, and not their values, we force equality between all such literals. Because of order-isomorphism, this equality must be kept in t, forcing a valid solution to use a single value for each literal (since different values match in p but mismatch in t). If we choose a literal to be positive/negative at some position in t, we force the value of that literal to be positive/negative at every position in t. Now, we focus on t C . Every clause has exactly one position in t C , and each of these positions have three choices of value, matching only the three values that satisfy a clause. Because we must choose one value in each position to solve our µOPPM instance, we must choose one value that satisfies each clause, for every clause.
Putting these two properties together, to solve µOPPM we must choose a literal value that satisfies each clause and those literals must have consistent values. This establishes the equivalence between the solutions of the two instances.
We can easily extract the solution from µOPPM to 3CNF-SAT by checking whether the values in t V are even or odd, true or false, respectively. There is a unique solution to 3CNF-SAT given an µOPPM solution.
To extract the solution from 3CNF-SAT to µOPPM, we take the values assigned to each variable and choose the respective values in t V . Then, we need to choose values for p C and t C , which can easily be done by choosing any of the literals that satisfies its respective clause. There may be multiple µOPPM solutions for a given 3CNF-SAT solution.  Irrespectively of the answer, the analysis of the average complexity is of complementary relevance. State-of-the-art research on the exact OPPM problem shows that the average performance of algorithms in O(nm) time can outperform linear time algorithms [12,46,47].
The properties of the proposed encoding guarantee that the exact matches of p ′ in t ′ cannot skip any op-match of p in t. Thus, when combining the premises of Lemma 4.1 with the previous observation, we guarantee that the computed µOPPM solution is sound.
The application of this simple filtration procedure prevents the recurring O(mr lg r) verifications n−m+1 times. Instead, the complexity of the proposed method to solve the µOPPM problem becomes O(dmr lg r +n) (when one string is indeterminate) where d is the number of exact matches (d ≪ n). According to previous work on exact OPPM with filtration procedures [12], SBNDM2 and SBNDM4 algorithms [48] (Boyer-Moore variants) were suggested to match binary encodings. In the presence of small patterns, Fast Shift-Or (FSO) [49] can be alternatively applied [12].
A given string text can be read and encoded incrementally from the standard input as needed to perform µOPPM, thus requiring O(mr) space. When filtration procedures are considered, the aforementioned algorithms for exact pattern matching require O(m) space [12], thus µOPPM space requirements are bound by substring verifications (Section 3): O(mr) space when one string is indeterminate and O((mr) 2 ) when indetermination is considered on both strings.

Open problem
We can look at the µOPPM by the number and position of the indeterminate characters. We have shown that, for any number of indeterminacies, µOPPM has a polynomial-time algorithm for indeterminate characters in a single string (Section 3.1), or in both strings, but never in both strings at the same position (Section 3.3). For indeterminate characters in both strings at the same position, we have also shown that for at least three indeterminacies (at select positions), the problem in NP-hard (Section 3.4).
There is a gap in between these two groups, however, for the strings where there are at most two indeterminate characters in both strings at the same position. It remains open whether or not this problem is NP-hard. Given that our reduction from Section 3.4 uses three indeterminate character in both strings, it also remains open whether the problem with two indeterminate characters in one string and three in the other (at the same position) is NP-hard.
Following the pattern-avoidance precedent by Guillemot and Vialette [50] for the related problem of permutation matching, we note that, for the case of µOPPM with at most two indeterminate characters (both strings, same position), there is a straightforward encoding in 2SAT for (1|3, 2|4)-avoiding strings, here taken to mean that, in a single string, for the pair of positions (i, j), the rank of the characters (only for the pair in question) is not 1|3 in i and 2|4 in j (with i and let j being interchangeable). The full problem, however, remains open.

Concluding remark
This work addressed the relevant yet scarcely studied problem of finding order-preserving pattern matches on indeterminate strings (µOPPM). We showed that the problem has a linear time and space solution when one string is indeterminate. In addition, the µOPPM problem (when both strings are indeterminate) was mapped into a satisfiability formula of polynomial size and two simple types of clauses in order to study efficient solvers for the µOPPM problem. Moreover the µOPPM problem was shown to be NP-hard in general. Finally, we showed that solvers of the µOPPM problem can be boosted in the presence of filtration procedures and we identified a still open problem in what concerns the computational complexity of the µOPPM problem when restricted to at most two indeterminate characters in both strings at the same position.