Abstract 1 Introduction 2 Algorithms and their encoding using transducers 3 Expected number of letter comparisons for a given pattern 4 Expected number of mispredictions 5 Results for small patterns, discussion and perspectives References 6 Appendix

Branch Prediction Analysis of Morris-Pratt and Knuth-Morris-Pratt Algorithms

Cyril Nicaud ORCID Univ Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vallée, France Carine Pivoteau Univ Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vallée, France Stéphane Vialette ORCID Univ Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vallée, France
Abstract

We investigate the classical Morris-Pratt and Knuth-Morris-Pratt pattern matching algorithms from the perspective of computer architecture, focusing on the effects of incorporating a simple branch prediction mechanism into the computational model. Assuming a fixed pattern and a random text, we derive precise estimates for the number of branch mispredictions incurred by these algorithms when using local predictors. Our analysis relies on tools from automata theory and Markov chains, offering a theoretical framework that can be extended to other text processing algorithms and more sophisticated branch prediction strategies.

Keywords and phrases:
Pattern matching, branch prediction, transducers, average case complexity, Markov chains
Copyright and License:
[Uncaptioned image] © Cyril Nicaud, Carine Pivoteau, and Stéphane Vialette; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Editors:
Paola Bonizzoni and Veli Mäkinen

1 Introduction

Pipelining is a fundamental technique employed in modern processors to improve performance by executing multiple instructions in parallel, rather than sequentially. Instructions execution is divided into distinct stages, enabling the simultaneous processing of multiple instructions, much like an assembly line in a factory. Today, pipelining is ubiquitous, even in low-cost processors priced under a dollar. We refer the reader to the textbook reference [6] for more details on this important architectural feature.

The sequential execution model assumes that each instruction completes before the next one begins; however, this assumption does not hold in a pipelined processor. Specific conditions, known as hazards, may prevent the next instruction in the pipeline from executing during its designated clock cycle. Hazards introduce delays that undermine the performance benefits of pipelining and may stall the pipeline, thus reducing the theoretical speedup:

  • Structural hazards occur when resource conflicts arise, preventing the hardware from supporting all possible combinations of instructions during simultaneous executions.

  • Control hazards arise from the pipelining of jumps and other instructions that modify the order in which instructions are processed, by updating the Program Counter (PC).

  • Data hazards occur when an instruction depends on the result of a previous instruction, and this dependency is exposed by the overlap of instructions in the pipeline.

To minimize stalls due to control hazards and improve execution efficiency, modern computer architectures incorporate branch predictors. A branch predictor is a digital circuit that anticipates the outcome of a branch (e.g., an if–then–else structure) before it is determined.

Two-way branching is typically implemented using a conditional jump instruction, which can either be taken, updating the Program Counter to the target address specified by the jump instruction and redirecting the execution path to a different location in memory, or not taken, allowing execution to continue sequentially. The outcome of a conditional jump remains unknown until the condition is evaluated and the instruction reaches the actual execution stage in the pipeline. Without branch prediction, the processor would be forced to wait until the conditional jump instruction reaches this stage before the next instruction can enter the first stage in the pipeline. The branch predictor aims to reduce this delay by predicting whether the conditional jump is likely to be taken or not. The instruction corresponding to the predicted branch is then fetched and speculatively executed. If the prediction is later found to be incorrect (i.e., a misprediction occurs), the speculatively executed or partially executed instructions are discarded, and the pipeline is flushed and restarted with the correct branch, incurring a small delay.

The effectiveness of a branch prediction scheme depends on both its accuracy and the frequency of conditional branches. Static branch prediction is the simplest technique, as it does not depend on the code execution history and, therefore, cannot adapt to program behavior. In contrast, dynamic branch prediction takes advantage of runtime information (specifically, branch execution history) to determine whether branches were taken or not, allowing it to make more informed predictions about future branches.

A vast body of research is dedicated to dynamic branch prediction schemes. At the highest level, branch predictors are classified into two categories: global and local. A global branch predictor does not maintain separate history records for individual conditional jumps. Instead, it relies on a shared history of all jumps, allowing it to capture their correlations and improve prediction accuracy. In contrast, a local branch predictor maintains an independent history buffer for each conditional jump, enabling predictions based solely on the behavior of that specific branch. Since the 2000s, modern processors typically employ a combination of local and global branch prediction techniques, often incorporating even more sophisticated designs. For a deeper exploration of this topic, see [10], and for a comprehensive discussion of modern computer architecture, refer to [6].

In this study, we focus on local branch predictors implemented with saturated counters. A 1-bit saturating counter (essentially a flip-flop) records the most recent branch outcome. Although this is the simplest form of dynamic branch prediction, it offers limited accuracy. A 2-bit saturating counter (see Figure 1), by contrast, operates as a state machine with four possible states: Strongly not taken (ν¯), Weakly not taken (ν), Weakly taken (τ), and Strongly taken (τ¯). When the 2-bit saturated branch predictor is in the Strongly not taken or Weakly not taken state, it predicts that the branch will not be taken and execution will proceed sequentially. Conversely, when the predictor is in the Strongly taken or Weakly taken state, it predicts that the branch will be taken, meaning execution will jump to the target address. Each time a branch is evaluated, the corresponding state machine updates its state. If the branch is not taken, the state shifts toward Strongly not taken; if taken, it moves toward Strongly taken. A misprediction (corresponding to a bold edge in Figure 1) occurs when:

  • a branch is not taken, while the predictor is in either of the Taken states (τ or τ¯);

  • a branch is taken, while the predictor is in either of the Not Taken states (ν or ν¯).

This mechanism gives the 2-bit saturating counter an advantage over the simpler 1-bit scheme: a branch must deviate twice from its usual behavior (i.e. a Strongly state) before the prediction changes, reducing the likelihood of mispredictions.

Figure 1: The 2-bit saturated predictor consists of four states: ν¯ and ν predict that the branch will not be taken, while τ and τ¯ predict that it will. The predictor updates at each condition evaluation, transitioning via T when the branch is taken (i.e., the condition is true) and via N when it is not. Bold edges indicate mispredictions.

This paper presents an initial theoretical investigation into the techniques and results related to the analysis of pattern matching algorithms within computational models augmented by branch prediction mechanisms. In particular, we study the classical Morris-Pratt (MP) and Knuth-Morris-Pratt (KMP) algorithms [11, 7] in such models, with a primary focus on quantifying branch mispredictions under random text inputs. Over the past two decades, research in this area has evolved from experimental studies on sorting algorithms and fundamental data structures [2] to theoretical analyses of misprediction behavior in Java’s dual-pivot Quicksort [9], as well as the design of skewed variants of classical algorithms (e.g., binary search, exponentiation by squaring) aimed at improving branch prediction efficiency [1]. These studies have largely focused on local predictors – particularly 2-bit saturating counters (see Figure 1) – which also serve as the basis for the models considered in this work.

2 Algorithms and their encoding using transducers

Throughout the article, indices start at 0: if u is a word of length |u|=n over the alphabet A, we represent it as u=u0un1, where each ui is a letter of u. We also use u[i] to denote the letter ui. If u=xyz where x, y and z are words, then x is a prefix of u, y is a factor and z is a suffix. A prefix (resp. suffix) of u is strict if it differs from u. For any i{0,,n}, we denote by Pref(u,i) the prefix of u of length i. A (strict) border of u is a word v that is both a (strict) prefix and a (strict) suffix of u.

Algorithms MP and KMP

The Morris-Pratt (MP) and Knuth-Morris-Pratt (KMP) algorithms are textbook solutions to the pattern matching problem [4, 5, 3]. Both rely on precomputing a failure function, which helps identify candidate positions for the pattern X in the text W. The general approach involves scanning W from left to right, one letter at a time. Before moving to the next letter in W, the algorithm determines the longest prefix of X that is also a suffix of the discovered prefix of W. The failure function allows this computation to be performed efficiently.

The function mpX maps each prefix of X to its longest strict border, with the convention that mpX(ε)=. The function kmpX is a refinement of mpX defined by kmpX(X)=mpX(X), kmpX(ε)=, and for all prefixes uα of X, where uA and αA, kmpX(u) is the longest strict suffix of u that is also a prefix of u but such that kmpX(u)α is not. If no such strict suffix exists, then kmpX(u)=. See [5] for a more detailed discussion of these failure functions. In the following, we only require that Algorithm FIND remains correct regardless of which failure function is used. Within the algorithm, the function b:=mpX (or b:=kmpX) is transformed into a precomputed integer-valued array B, defined as B[i]=|b(Pref(X,i))| for i{0,,|X|}, with the convention that ||=1.

Algorithm 1 FIND ​(X, W, B ).

Algorithm FIND utilizes the precomputed table B from either mpX or kmpX to efficiently locate potential occurrences of X in W. The indices i and j denote the current positions in X and W, respectively. The main while loop (Line 3) iterates once for each letter discovered in W. At the start of each iteration, index i holds the length of the longest matching prefix of X. The inner while loop (Line 4) updates i using the precomputed table B. Finally, the if statement (Line 7) is triggered when an occurrence of X is found, updating i accordingly. For both MP and KMP, the table B can be computed in 𝒪(m) time and FIND runs in 𝒪(n) time. More precisely, in the worst case, Algorithm FIND performs at most 2nm letter comparisons [4].

Note that in any programming language supporting short-circuit evaluation of Boolean operators, the condition i0 and X[i]W[j] at Line 4 of Algorithm FIND is evaluated as two separate jumps by the compiler. As a result, Algorithm FIND contains a total of four branches: one at Line 3, two at Line 4, and one at Line 7. In our model, each of these four branches is assigned a local predictor, and all may potentially lead to mispredictions. When a conditional instruction is compiled, it results in a jump that can correspond to either a taken or a not-taken branch.111Most assembly languages provide both jump-if-true and jump-if-false instructions. For consistency, we define a successful condition – when the test evaluates to true – as always leading to a taken branch. This convention does not affect our analysis, as the predictors we consider are symmetric.

Associated automata

At the beginning of each iteration of the main while loop of Algorithm Find, the prefix of length j of W (i.e. from letter W[0] to letter W[j1]) has been discovered, and i is the length of the longest suffix of Pref(W,j) that is a strict prefix of X: we cannot have i=m, because when it happens, the pattern is found and i is directly updated to B[i], Line 7.

The evolution of i at each iteration of the main while loop is encoded by a deterministic and complete automaton 𝒜X. Its set of states is the set QX of strict prefixes of X, identified by their unique lengths if necessary. Its transition function δX maps a pair (u,α) to the longest suffix of uα which is in QX. Its initial state is ε. If X=Yα, where αA is a letter, then when following the path labeled by W starting from the initial state of 𝒜X, there is an occurrence of X in W exactly when the transition Y𝛼δX(Y,α) is used. This variant of the classical construction is more relevant for this article than the usual one [4, Sec. 7.1] which also has the state X. The transition Y𝛼δX(Y,α) is the accepting transition to identify occurrences of X. The automaton 𝒜X tracks the value of i at the beginning of each iteration of the main loop, where i corresponds to the length of the current state label. This value remains the same for both MP and KMP. An example of 𝒜X is depicted in Figure 2.

Figure 2: The deterministic and complete automaton 𝒜X for X=ababb.

To refine the simulation of the Find algorithm using automata, we incorporate the failure functions. This is achieved by constructing the failure automaton. Specifically, for Algorithm MP, let Xmp be the automaton defined by:

  • A state set QX{} and an initial state ε.

  • Transitions 𝛼ε for every αA.

  • Transitions u𝛼uα for every uQX such that uαQX.

  • A failure transition umpX(u) for every uQX, used when attempting to read a letter α where uαQX.

The automaton Xkmp associated with KMP is identical to Xmp, except that its failure transitions are defined as ukmpX(u) for every uQX. Both automata serve as graphical representations of the failure functions mpX and kmpX, structured in a way that aligns with 𝒜X. An example of Xmp and Xkmp is illustrated in Figure 3.

Figure 3: The automata Xmp and Xkmp for X=ababb, on the same picture; the failure transitions of Xmp are in dotted red lines and above, those of Xkmp are in dashed blue lines and below. To read the letter a from state aba in Xmp, one follows the failure transition abaa then aε until one can finally read ε𝑎a. In Xkmp, only one failure transition abaε is needed, instead of two.

When reading a letter α from a state u in Xmp or Xkmp, if the transition u𝛼uα does not exist, the automaton follows failure transitions until a state with an outgoing transition labeled by α is found. This process corresponds to a single backward transition in 𝒜X. Crucially, using a failure transition directly mirrors the execution of the nested while loop in Algorithm Find (Line 4) or triggers the if statement at Line 7 when an occurrence of X is found. This construction captures what we need for the forthcoming analysis.

3 Expected number of letter comparisons for a given pattern

In this section, we refine the analysis of the expected number of letter comparisons performed by Algorithm FIND for a given pattern X of length m and a random text W of length n. The average-case complexity of classical pattern matching algorithms has been explored before, particularly in scenarios where both the pattern and the text are randomly generated. Early studies [12, 13] examined the expected number of comparisons in Algorithms MP and KMP under memoryless or Markovian source models, employing techniques from analytic combinatorics. Prior attempts based on Markov chains introduced substantial approximations, limiting their accuracy compared to more refined combinatorial methods [12, 13]. Here, we refine and extend this Markov chain-based methodology, providing a more precise foundation for analyzing the expected number of mispredictions (see Section 4).

Let π be a probability measure on A such that for all αA, 0<π(α)<1 (we also use the notation πα:=π(α) in formulas when convenient). For each n0 and each WAn, we define πn(W):=i=0n1π(Wi). For any n, the measure πn is a probability on An, where all letters are chosen independently following π. We obtain the uniform distribution on An if π is the uniform distribution on A, with π(α)=1|A| for all αA.

Encoding the letter comparisons with transducers

Letter comparisons occur at Line 4 only if i0, due to the lazy evaluation of the and operator (when i<0, W[j] is not compared to X[i]). We can encode these comparisons within the automata of Section 2 by adding outputs to the transitions, thereby transforming them into transducers. In both Xmp and Xkmp, each transition u𝛼uα corresponds to matching letters, meaning the test X[i]W[j] evaluates to false. We denote this with the letter N for a not taken branch. Conversely, following a failure transition indicates that the test X[i]W[j] is true, which we denote by T for taken. Transitions from correspond to cases where i0 is false, meaning no letter comparisons occur, as noted above. This construction is illustrated in Figure 4.

Figure 4: The automata Xmp and Xkmp transformed into transducers by adding the result of letter comparisons in Find as output of each transition.

We keep track of the results of the comparisons X[i]W[j] in 𝒜X by simulating the reading of each letter in the transducer associated with Xmp and concatenating the outputs. This transforms 𝒜X into the transducer 𝒯Xmp for MP, by adding an output function 𝒯Xmp to 𝒜X as follows (see Figure 5 for an example).

𝒯Xmp(u𝛼)={Nif uαQX or uα=X,Tif uαQX and mp(u)=,T𝒯Xmp(mp(u)𝛼)otherwise. (1)

Instead of 𝒯Xkmp we can use the output 𝒯Xkmp defined as 𝒯Xmp except that mp is changed into kmp. This yields the transducer 𝒯Xkmp associated with KMP.

Figure 5: The transducers 𝒯Xmp and 𝒯Xkmp for X=ababb. The only difference between them lies in the transition aba𝑎a, for which Algorithm MP uses one more letter comparison.

Recall that the output of a path in a transducer is the concatenation of the outputs of its transitions. As the transducers 𝒯Xmp and 𝒯Xkmp are (input-)deterministic and complete, the output of a word is the output of its unique path that starts at the initial state. From the classical link between 𝒜X and Algorithm FIND [4] we have the following key statement.

Lemma 1.

The sequence of results of the comparisons X[i]W[j] when applying Algorithm Find to the pattern X and text W is equal to the output of the word W in the transducer 𝒯Xmp for Algorithm MP, and in the transducer 𝒯Xkmp for KMP.

State probability and expected number of comparisons

Since we can use exactly the same techniques, from now on we focus on KMP for the presentation. Recall that if we reach the state u after reading the first j letters of W in 𝒜X, and hence in 𝒯Xkmp, then at the next iteration of the main while loop, index i contains the value |u|. For uQX and j{0,,n1}, we are thus interested in the probability pX(j,u) that after reading Pref(W,j) in 𝒯Xkmp we end in a state u. Slightly abusing notation, we write π(u)=π|u|(u). For any uQX let bord(u) denote the longest strict border of u, with the convention that bord(ε)=.

Lemma 2.

For any uQX and any jm, pX(j,u) does not depend on j and we have pX(j,u)=pX(u) with pX(u):=π(u)vQXbord(v)=uπ(v).

From Lemma 2 we can easily estimate the expected number of comparisons for any fixed pattern X, when the length n of W tends to infinity. Indeed, except when j<m, the probability pX(j,u) does not depends on j. Moreover, if we are in state u, from the length of the outputs of 𝒯Xkmp we can directly compute the expected number of comparisons during the next iteration of the main while loop. See Figure 6 for a graphical representation.

Figure 6: A graphical representation for the computation of the expected number of comparisons for the uniform distribution on {a,b}: in 𝒯Xmp and 𝒯Xkmp the state labels u have been changed into their probabilities pX(u), the letters into their probabilities 1/2, and the output into their lengths. For instance, the probability to use the transition aba𝑎a is 21612=116 and it yields 3 comparisons for MP or 2 for KMP. Hence its contribution to CX in Proposition 3 is 316 or 18.
Proposition 3.

As n, the expected number of letter comparisons performed by Algorithm FIND with KMP (or MP with 𝒯Xmp) is asymptotically equivalent to CXn, where

CX=uQXpX(u)aAπ(a)|𝒯Xkmp(u𝑎)|, and 1CX2.

Observe that Lemma 2 can also be derived by transforming 𝒯Xkmp into a Markov chain and computing its stationary distribution [8]. However, Lemma 2 provides a more direct and simpler formula, which appears to have gone unnoticed in the literature. Markov chains will prove very useful in Section 4.2.

4 Expected number of mispredictions

We now turn to our main objective: a theoretical analysis of the number of mispredictions for a fixed pattern X and a random text W. The analysis depends on the type of predictor used (see Section 1). In what follows, all results are stated for local 2-bit saturated counters, such as the one in Figure 1. Let ξ denote its transition function extended to binary words. For example, ξ(ν,NNNTT)=τ. Additionally, let μ(λ,s) denote the number of mispredictions encountered when following the path in the predictor starting from state λ{ν¯,ν,τ,τ¯} and labeled by s{N,T}. For instance, μ(ν,NNNTT)=2.

As previously noted, Algorithm FIND contains four branches in total: one at Line 3, two at Line 4, and one at Line 7. Each of these branches is assigned a local predictor, and all have the potential to generate mispredictions. The mispredictions generated by the main while loop (i.e. Line 3) are easily analyzed. Indeed, the test holds true for n times and then becomes false. Hence, the sequence of taken/not taken outcomes for this branch is TnN. Therefore, starting from any state of the 2-bit saturated predictor, at most three mispredictions can occur. It is asymptotically negligible, as we will demonstrate that the other branches produce a linear number of mispredictions on average.

4.1 Mispredictions of the counter update

We analyze the expected number of mispredictions induced by the counter update at Line 7. The sequence s of taken/not-taken outcomes for this if statement is defined by sj=T if and only if Pref(W,j) ends with the pattern X, for all j{0,,n1}. This is easy to analyze, especially when the pattern X is not the repetition of a single letter. Proposition 4 establishes that, on average, there is approximately one misprediction for each occurrence of the pattern in the text.

Proposition 4.

If X contains at least two distinct letters, then the expected number of mispredictions caused by the counter update is asymptotically equivalent to π(X)n.

Proof (sketch).

Since X contains at least two distinct letters, it cannot be a suffix of both Pref(W,j) and Pref(W,j+1). Hence, the sequence s is of the form (N+T)N. This means that every step to the right in the local predictor (for every T in sequence s), which corresponds to a match, is followed by a step to the left, except possibly for the last step. Thus, if the local predictor reaches state ν¯, it remains in ν or ν¯ forever. Having three consecutive positions in W without an occurrence of X is sufficient to reach state ν¯. This happens in fewer than 𝒪(logn) iterations with high probability, and at this point there is exactly one misprediction each time the pattern is found. This concludes the proof, as the expected number of occurrences of X in W is asymptotically equivalent to π(X)n.

The analysis of the case X=αm, where X consists of a repeated single letter, is more intricate. We first present the proof sketch for X=αα, which captures all the essential ideas. Let A=A{α} and write W=β1β2βαx, where βi=αkiα¯ with ki0 and α¯A. Depending on the value of ki, one can compute the sequence of taken/not taken outcomes induced by a factor αkiα¯, which is either preceded by a letter α¯ or nothing: α¯ yields N, αα¯ yields NN, α2α¯ yields NTN, and so on. Thus, more generally, α¯ yields N and αkiα¯ yields NTki1N for ki1. We then examine the state of the predictor and the number of mispredictions produced after each factor βi is read. For instance, if just before reading βi=α3α¯ the predictor state is ν, then the associated sequence NTTN produces three mispredictions and the predictor ends in the same state ν, which can be seen on the path ν𝑁ν¯misp.𝑇νmisp.𝑇τmisp.𝑁ν. Since τ¯ cannot be reached except at the very beginning or at the very end, it has a negligible contribution to the expectation, and we can list all the relevant possibilities as follows:

k=0k=1k=2k=3k4NNNNTNNTTNNTk1Nν¯ν¯0 misp.ν¯0 misp.ν¯1 misp.ν3 misp.τ3 misp.νν¯0 misp.ν¯0 misp.ν¯1 misp.ν3 misp.τ3 misp.τν1 misp.ν¯1 misp.ν3 misp.τ3 misp.τ3 misp.

In the table above, the states ν¯ and ν produce identical outcomes and can therefore be merged into a single state, denoted as ν~, for the analysis. The resulting transitions form a graph with two vertices, which is then converted into a Markov chain by incorporating the transition probabilities αkα¯, as illustrated in Figure 7.

Figure 7: On the left, the transition system determined by the factor αkα¯; on the right, the corresponding Markov chain. For clarity, we denote p:=π(α).

The stationary distribution π0 of this Markov chain is straightforward to compute, yielding π0(ν~)=1p31p3+p4 and π0(τ)=p41p3+p4, where p:=π(α). From each state, the expected number of mispredictions can be computed using the transition table. For instance, starting from ν~, a misprediction occurs when k=2 with probability (1p)p2, and three mispredictions occur when k3 with probability p3. Therefore, the expected number of mispredictions when reading the next factor αkα¯ from ν~ is given by (1p)p2+3p3. Finally, with high probability there are around (1p)n factors of the form αα¯ in the decomposition of W, which corresponds to roughly the same number of steps in the Markov chain. The general statement for X=αm is as follows.

Proposition 5.

If X=αm, the expected number of mispredictions caused by the counter update is asymptotically κm(π(α))n, with κm(p)=pm(1p)(1+p)2 for m3, and

κ1(p)=p(1p)12p(1p), and κ2(p)=p2(1p)(1+2p+p2p3)1p3+p4.

4.2 Expected number of mispredictions during letter comparisons

In this section, we analyze the expected number of mispredictions caused by letter comparisons in KMP (similar results can be derived for MP).

According to Lemma 1, the outcome of letter comparisons in KMP is encoded by the transducer 𝒯Xkmp. More precisely, following a transition uα:sv in this transducer simulates a single iteration of the main loop of Algorithm FIND, starting with i=|u| and processing the letter α:=W[j]. At the end of this iteration, i=|v|, and s{N,T} is the sequence of taken/not-taken outcomes for the test X[i]W[j].

The mispredictions occurring during this single iteration of the main loop depend on the predictor’s initial state λ and the sequence s which is computed using 𝒯Xkmp. The number of mispredictions μ(λ,s) is retrieved by following the path starting from state λ and labeled by s in the predictor, corresponding to the transition ξ(λ,s). This is formalized by using a coupling of 𝒯Xkmp with the predictor in Figure 1, forming a product transducer 𝒫Xkmp, defined as follows (see Figure 8 for an example):

  • the set of states is QX×{ν¯,ν,τ,τ¯},

  • there is a transition (u,λ)α:μ(λ,s)(δX(u),ξ(λ,s)) for every state (u,λ) and every letter α, where s is the output of the transition uα:sδX(u) in 𝒯Xkmp.

By construction, at the beginning of an iteration of the main loop in Algorithm FIND, if i=|u|, λ is the initial state of the 2-bit saturated predictor, and α=W[j], then, during the next iteration, μ(λ,s) mispredictions occur, and the predictor terminates in state ξ(λ,s), where uα:sδX(u) in 𝒯Xkmp. This leads to the following statement.

Lemma 6.

The number of mispredictions caused by letter comparisons in KMP, when applied to the text W and the pattern X, is given by the sum of the outputs along the path that starts at (ε,λ0) and is labeled by W in 𝒫Xkmp, where λ0{ν¯,ν,τ,τ¯} is the initial state of the local predictor associated with the letter comparison.

Figure 8: The strongly connected terminal component of 𝒫Xkmp in black and blue, for X=ababb. In black and red, the variant for 𝒫Xmp.

We can then proceed as in Proposition 5: the transducer 𝒫Xkmp is converted into a Markov chain by assigning a weight of π(α) to the transitions labeled by a letter α. From this, we compute the stationary distribution π0 over the set of states, allowing us to determine the asymptotic expected number of mispredictions per letter of W. This quantity, LX, satisfies

LX=uQXλ{ν¯,ν,τ,τ¯}π0(u,λ)×αAπ(α)𝒫Xkmp((u,λ)𝛼). (2)

Observe that when processing a long sequence of letters different from X[0], the letter comparisons produce a sequence of T’s, causing the 2-bit saturated predictor to settle in state τ¯ while i=0 in the algorithm. Consequently, the state (ε,τ¯) is reachable from every other state. Hence, the Markov chain has a unique terminal strongly connected component (i.e. there are no transitions from any vertex in this strongly connected component to any vertex outside of it), which includes (ε,τ¯) along with a self-loop at this state. Thus, our analysis focuses on this component, allowing us to apply classical results on primitive Markov chains [8], ultimately leading to Equation (2). Notably, this result is independent of the predictor’s initial state. The computation of LX can be easily carried out using computer algebra, since computing the stationary probability reduces to inverting a matrix.

Proposition 7.

The expected number of mispredictions caused by letter comparisons in KMP on a random text of length n and a pattern X, is asymptotically equivalent to LXn.

4.3 Expected number of mispredictions of the test 𝒊𝟎

We conclude the analysis by examining the mispredictions caused by the test i0 at Line 4 of Algorithm FIND. To this end, we use the previously constructed transducer 𝒯Xkmp (or equivalently 𝒯Xmp, as the approach remains the same) to capture the behavior of this test through a straightforward transformation of the outputs. Recall that a transition uα:sv in 𝒯Xkmp, with s{N,T} indicates that when reading the letter α, the inner while loop performs |s| character comparisons, with the result encoded by the symbols of s. Due to the loop structure, s always takes one of two forms:

  • TN and the loop terminates because X[i]=W[j] eventually, or

  • T+ and the loop terminates because i=1 eventually.

In the first case, the condition i0 holds for |s| iterations. In the second case, the condition holds for |s| iterations before failing once. Thus, we define the transducer 𝒯~Xkmp identically to 𝒯Xkmp, except for its output function:

𝒯~Xkmp(u𝛼)={T|s|if s=𝒯Xkmp(u𝛼)TN,T|s|Nif s=𝒯Xkmp(u𝛼)T+. (3)

The same transformation can be applied to 𝒯Xmp for MP. At this stage, we could directly reuse the framework from Section 4.2 to compute the asymptotic expected number of mispredictions for any given pattern X. However, a shortcut allows for a simpler formulation while offering deeper insight into the mispredictions caused by the test i0.

Since each output is either TkN for some k1 or Tk, the local predictor state generally moves toward τ¯, except in the case of TN. In this latter case, the predictor either remains in the same state or transitions from τ¯ to τ. Moreover, from any state s of 𝒜X, there always exists a letter α such that sα:T in 𝒯~Xmp or 𝒯~Xkmp (for instance, the transition that goes to the right or when the pattern is found). As a result, with high probability, the predictor reaches the state τ¯ in at most 𝒪(logn) iterations of the main loop of Algorithm FIND. Once in τ¯, the predictor remains confined to the states τ and τ¯ indefinitely. Thus, with high probability, except for a small number of initial steps, the predictor consistently predicts that the branch is taken. At this point, a misprediction occurs if and only if the output belongs to TN, which happens precisely when a non-accepting transition in 𝒯~Xkmp leads to the state ε. Since 𝒯~Xkmp and 𝒯~Xmp differ only in their output functions, this result holds for both MP and KMP, allowing us to work directly with 𝒜X. Applying Lemma 2, we obtain the following statement.

Proposition 8.

When Algorithms MP or KMP are applied to a random text W of length n with a given pattern X, the expected number of mispredictions caused by the test i0 is equal to the expected number of times a transition ending in ε is taken along the path labeled by W in 𝒜X, up to an error term of 𝒪(logn). As a result, the expected number of such mispredictions is asymptotically equivalent to GXn, where GX=uQXpX(u)u𝛼εuαXπ(α).

5 Results for small patterns, discussion and perspectives

Table 1: We analyze the asymptotic expected number of mispredictions per symbol in the text for each branch of Algorithm FIND, considering Σ={a,b} and all normalized patterns of length 2 and 3. To improve readability, we introduce p:=π(a)=1π(b). Notably, for the patterns X=ab and X=abb, the failure functions of Algorithms MP and KMP are identical, making both variants of Algorithm Find behave the same in these cases. The functions κ2 and κ3 are defined in Proposition 5.
X i = m i >= 0 Algo. X[i] != T[j]
aa κ2(p) 1p MP p(1p)(1+2p)1p2+p3
KMP p(1p)12p+2p2
ab p(1p) (1p)2 both p(37p+7p22p3)1p+2p2p3
aaa κ3(p) 1p MP p(1p)(1+p)2
KMP p(1p)12p+2p2
aab p2(1p) (1p)2(1+p) MP p(1+2pp28p3+6p4+5p55p6+p7)
KMP p(12p2p3+5p43p5+p6)12p+3p22p3+p4
aba p2(1p) (1p)2 MP p(37p+8p24p3+p4)1p+p2
KMP p(37p+7p22p3)1p+2p2p3
abb p(1p)2 (1p)3 both p(413p+21p216p3+6p4p5)

We conducted a comprehensive study of local branch prediction for MP and KMP and provide the code222Python notebook (using sympy), available at https://github.com/vialette/branch-prediction/ that allows to quantify mispredictions for any alphabet size, any given pattern and any memoryless source for the input text (as for the examples given in Table 1).

Notably, the expressions for the number of mispredicted letter comparisons become increasingly complex as the pattern length grows and as the alphabet size increases. For instance, for the pattern X=abab, with πa:=π(a) and πb:=π(b), we obtain:

Labab=πa(πa3πb+2πa2πb3+4πa2πb2+3πa2πb+πa25πaπb24πaπb2πa+2πb+1)(1πa)(πa2πb2+πa2πbπaπbπa+1).

The results given in Table 2 illustrate this for the uniform distribution, for small patterns and alphabets. In particular, the branch i0, which is poorly predicted by its local predictor, exhibits a very high number of mispredictions when |A|=4, while the branch that comes from letter comparisons, X[i]T[j], experiences fewer mispredictions. This trend becomes more pronounced as the size of the alphabet increases: for X=abb and |A|=26, the misprediction rate for the test i0 reaches 0.96, whereas for X[i]T[j], it drops to 0.041.

Our work presents an initial theoretical exploration of pattern matching algorithms within computational models enhanced by local branch prediction. However, modern processors often employ hybrid prediction mechanisms that integrate both local and global predictors, with global predictors capturing correlations between branch outcomes across different execution contexts – in our simulations with PAPI333https://icl.utk.edu/papi/ on a personal computer, the actual number of mispredictions is roughly divided by |A| in practice. A key direction for further research is to develop a theoretical model that incorporates both predictors, allowing for more precise measurement in real-world scenarios. Another important line of research is to account for enhanced probabilistic distributions for texts, as real-word texts are often badly modeled by memoryless sources. For instance, Markovian sources should be manageable within our model and could provide a more accurate framework for the analysis.

Table 2: Asymptotic expected number of mispredictions per input symbol in a random text W, using Algorithm FIND, assuming a uniform distribution over alphabets of size 2 and 4.
|A|=2 |A|=4
X i=m i>=0 algo X[i]!=T[j] Total i=m i>=0 algo X[i]!=T[j] Total
aa 0.283 0.5 MP 0.571 1.353 0.073 0.75 MP 0.295 1.117
KMP 0.5 1.283 KMP 0.3 1.123
ab 0.25 0.25 both 0.571 1.321 0.062 0.688 both 0.375 1.186
aaa 0.14 0.5 MP 0.563 1.202 0.018 0.75 MP 0.293 1.06
KMP 0.5 1.14 kmp 0.3 1.068
aab 0.125 0.375 MP 0.605 1.23 0.015 0.734 MP 0.322 1.086
KMP 0.542 1.166 KMP 0.322 1.086
aba 0.125 0.25 MP 0.708 1.083 0.015 0.688 MP 0.367 1.068
KMP 0.571 0.946 KMP 0.375 1.076
abb 0.125 0.125 both 0.547 0.921 0.015 0.672 both 0.397 1.098

References

  • [1] Nicolas Auger, Cyril Nicaud, and Carine Pivoteau. Good predictions are worth a few comparisons. In Nicolas Ollinger and Heribert Vollmer, editors, 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17-20, 2016, Orléans, France, volume 47 of LIPIcs, pages 12:1–12:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.STACS.2016.12.
  • [2] Gerth Stølting Brodal and Gabriel Moruz. Tradeoffs between branch mispredictions and comparisons for sorting algorithms. In Frank K. H. A. Dehne, Alejandro López-Ortiz, and Jörg-Rüdiger Sack, editors, Algorithms and Data Structures, 9th International Workshop, WADS 2005, Waterloo, Canada, August 15-17, 2005, Proceedings, volume 3608 of Lecture Notes in Computer Science, pages 385–395. Springer, 2005. doi:10.1007/11534273_34.
  • [3] Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007.
  • [4] Maxime Crochemore and Wojciech Rytter. Text Algorithms. Oxford University Press, 1994.
  • [5] Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. doi:10.1017/CBO9780511574931.
  • [6] John L. Hennessy and David A. Patterson. Computer Architecture, Sixth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., 6th edition, 2017.
  • [7] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977. doi:10.1137/0206024.
  • [8] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.
  • [9] Conrado Martínez, Markus E. Nebel, and Sebastian Wild. Analysis of branch misses in quicksort. In Proceedings of the Twelfth Workshop on Analytic Algorithmics and Combinatorics, ANALCO 2015, San Diego, CA, USA, January 4, 2015, pages 114–128, 2015. doi:10.1137/1.9781611973761.11.
  • [10] Sparsh Mittal. A survey of techniques for dynamic branch prediction. Concurrency and Computation: Practice and Experience, 31, 2018.
  • [11] James H Morris, Jr and Vaughan R Pratt. A Linear Pattern-Matching Algorithm. Technical report, University of California, Berkeley, CA, January 1970.
  • [12] Mireille Régnier. Knuth-morris-pratt algorithm: An analysis. In Antoni Kreczmar and Grazyna Mirkowska, editors, Mathematical Foundations of Computer Science 1989, MFCS’89, Porabka-Kozubnik, Poland, August 28 - September 1, 1989, Proceedings, volume 379 of Lecture Notes in Computer Science, pages 431–444. Springer, 1989. doi:10.1007/3-540-51486-4_90.
  • [13] Mireille Régnier and Wojciech Szpankowski. Complexity of sequential pattern matching algorithms. In Michael Luby, José D. P. Rolim, and Maria J. Serna, editors, Randomization and Approximation Techniques in Computer Science, Second International Workshop, RANDOM’98, Barcelona, Spain, October 8-10, 1998, Proceedings, volume 1518 of Lecture Notes in Computer Science, pages 187–199. Springer, 1998. doi:10.1007/3-540-49543-6_16.

6 Appendix

In the first series of four figures (Figure 9, Figure 10, Figure 11 and Figure 12), we apply the analytical formulas developed in the article to compute the expected number of mispredictions for each branch, as well as the total number of mispredictions across the entire matching process. These computations are performed as a function of the character probability π(a), which varies in the interval [0,1] under the assumption of an i.i.d. binary text model over the alphabet {a,b}. Each figure corresponds to a different pattern: aaaa, aaab, abab, and abbb. Each plot was computed in under a few seconds on a personal laptop.

Refer to caption
Figure 9: Pattern X=aaaa.
Refer to caption
Figure 10: Pattern X=aaab.
Refer to caption
Figure 11: Pattern X=abab.
Refer to caption
Figure 12: Pattern X=abbb. Both variant MP and KMP have the same border table.

In the last two figures (Figure 13 and Figure 14), we focus on a more detailed case study. We consider all non-trivial prefixes (i.e., of length at least 2) of the pattern abababb, and for each prefix, we compute the expected number of mispredictions per text symbol as the text length tends to infinity. This is done for both the MP and the KMP algorithms. The results suggest a form of convergence as the length of the prefix increases. This is consistent with theoretical expectations, as longer prefixes imply that deeper states of the automaton 𝒜X become increasingly less likely to be reached in practice.

Refer to caption
Figure 13: Total number of mispredictions for the prefixes of abababb (KMP).
Refer to caption
Figure 14: Total number of mispredictions for the prefixes of abababb (MP).