Negated String Containment Is Decidable
Abstract
We provide a positive answer to a long-standing open question of the decidability of the not-contains string predicate. Not-contains is practically relevant, for instance in symbolic execution of string manipulating programs. Particularly, we show that the predicate , where and are sequences of string variables constrained by regular languages, is decidable. Decidability of a not-contains predicate combined with chain-free word equations and regular membership constraints follows.
Keywords and phrases:
not-contains, string constraints, word combinatorics, primitive wordCopyright and License:
2012 ACM Subject Classification:
Theory of computation Regular languages ; Theory of computation Automated reasoning ; Theory of computation Logic and verificationAcknowledgements:
We thank the anonymous reviewers for careful reading of the paper and their suggestions that greatly improved its quality.Funding:
This work was supported by the Czech Ministry of Education, Youth and Sports ERC.CZ project LL1908, the Czech Science Foundation project 25-18318S, and the FIT BUT internal project FIT-S-23-8151. The work of Michal Hečko, a Brno Ph.D. Talent ScholarshipEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
String constraints have been recently intensely studied in relation to their applications in analysis of string manipulation in programs, e.g., in the analyses of security of web applications or cloud resource access policies [43]. Apart from a plethora of practical solvers, e.g., cvc5 [26, 27, 7, 28, 42, 38, 41], Z3 [23, 11, 33], Ostrich [29, 13, 16, 14, 15], Z3-Noodler [18, 17, 12], Trau [4, 2, 1] Z3Str/2/3/4/3RE [10, 9, 8], Woorpje [20], and nfa2sat [32], the theoretical landscape of string constraints has been intensely studied too. The seminal work of Makanin [35], establishing decidability of word equations, was followed by the work of Plandowski [39] (and later Jeż’s work on recompression) that placed the problem in PSpace. A number of relatively recent works study extensions of string constraints with constraints over string lengths, transducer-defined relational constraints, string-integer conversions, extensions of regular properties, replace-all, etc. As the extended string constraints are in general undecidable, these works focus on finding practically relevant decidable fragments such as the straight-line [16, 29, 13, 15, 14] and chain-free [4, 17] fragments, quadratic equations [36], and others (e.g., [5, 21]).
The most essential constraints, from the practical perspective, are considered to be word equations, regular membership constraints, length constraints, and also , as argued, e.g., in [44], and as can also be seen in benchmarks, for instance, in [45, 3]. While the three former types of constraints are intensely studied, was studied only little. Yet, it is important as well as theoretically interesting: besides the occurrence in existing benchmarks, its importance follows also from its ability to capture other highly practical types of constraints. E.g., the function should return the position of the first occurrence of in . It can be converted to the word equation after which the returned value equals . To ensure that is indeed the first occurrence in , there should be no occurrence of in where is the prefix of without the last symbol, i.e., for . This can be expressed as (e.g., Z3 solves in this way [23]).
As mentioned above, the problem is also interesting from the theoretical perspective. Although the positive version, , can be easily encoded using word equations, the negation is difficult. Its precise conversion to word equations would require universal quantification, which is undecidable for word equations in general [22]. The most systematic attempts at solving have been made in [3, 19]. In [3], the authors extend the flattening underapproximating framework behind the solver Trau [2, 1] and give a precise solution for if all involved string variables are constrained by flat languages (a flat language here stands for a finite union of concatenations of iterations of words) and, moreover, if no string variable appears multiple times, thus avoiding most of the difficulty of the problem. Our recent work [19], on top of which we build here, proceeds in a similar direction and removes the restriction of [3] on multiple occurrences of variables, but still requires all languages to be flat, which is a quite severe restriction. Practical heuristics used in solvers generally solve only easy cases and quickly fail on more complex ones, cf. [19], and do not give any guarantees. E.g., cvc5 translates into a universally quantified disequality [40], which is in turn handled by cvc5’s incomplete quantifier instantiation [37].
In this paper, we show decidability of a much more general kind of than [19, 3], namely of the form where eedle and aystack are string terms (sequences of symbols and variables) and constrains variables by any regular language. The constraint is satisfied by an assignment to string variables respecting under which is not a factor (i.e., a continuous subword) of (i.e., if eedle cannot be found in aystack).
Our solution of the problem leads relatively deep into word combinatorics and automata theory. We rely on the result in [19] giving a decision procedure for a flat-language version of the problem. The work [19] uses an automata-based construction inspired by deciding functional equivalence of streaming string transducers [6]. Using a variation on automata Parikh images, it transforms the problem into an equisatisfiable Presburger arithmetic formula (which is decidable). The general case with variables restricted by arbitrary regular languages, the subject of this paper, is solved by a reduction to this flat-language fragment. The core idea of our proof is that we can always find fresh primitive words in non-flat languages that can be repeated an arbitrary number of times. The result of such a repetition is a word that can share with other variables only subwords of a bounded size, assuming all words assigned to variables are sufficiently long. The reduction technically requires a dive into combinatorics on words and results on primitive words [31, 35, 34, 30], which are closely related to flat languages. Our techniques shares traits with the work of Karhumäki et al. [25], which constructs long primitive words to show that disjunctions of word equations can be encoded into a single equation. First, for variables with non-flat languages occurring on both sides of the constraint, we show that we can replace each of them with a single fresh symbol. This is because non-flat languages allow us to choose a sufficiently complex word for the variable that can be matched only with the value of on the other side ( is the other side of and vice versa). For variables with non-flat languages that appear only in , we show that after enumerating all possible assignments for them up to a certain bound, their languages can be underapproximated by flat languages while preserving satisfiability.
2 Preliminaries
Numbers.
We use for natural numbers (including zero). For , their greatest common divisor is denoted as and their least common multiple is denoted as .
Words.
An alphabet is a finite non-empty set of symbols. Let be fixed for the rest of the paper. A (finite) word over is a sequence of symbols from , where is the length of , denoted as . The empty word of the length 0 is denoted by and a concatenation of two words and is denoted as (or shortly ). An iteration of a word is defined as and for . The set of all words over is denoted as . A primitive word cannot be written as for any and , and we will use Greek letters from the beginning of the alphabet to denote primitive words. We denote the set of all primitive words . A word is a factor (i.e., a continuous subword) of every word . Given two words and , we say that the factors and have an overlap of size if . The overlap of and in the words and contains a conflict if there is a position with and such that the words and contain a different letter at position .
Languages.
A language over is a subset of . We will sometimes abuse notation and, given a word , use to also denote the language . For two languages and , we use (or just ) for their concatenation . A bounded iteration of a language is defined as and for . The (unbounded) iteration is . For a word we use () to denote the set of prefixes (suffixes) of and to denote the set of all factors of . We lift the definitions to languages as usual. A language is flat iff it can be expressed as a finite union
| (1) |
where every s.t. is a word over , else it is non-flat. Flatness of can be characterised by the absence of the so-called “butterfly loops”:
Fact 1.
A regular language is non-flat iff for some with for any word .
Automata.
A (nondeterministic finite) automaton (NFA) over is a tuple where is a set of states, is a set of transitions of the form with and , is the set of initial states, and is the set of final states. A run of over a word from state to state is a sequence of transitions , , , from . The empty sequence is a run with over . We denote by that has such a run, from where we drop the subscript if it is clear from the context. The run is accepting if and , and the language of is . Languages accepted by NFAs are called regular. is a deterministic finite automaton (DFA) if and for every symbol and every pair of transitions and in it holds that if then .
The constraint.
Let be a set of (string) variables. A term is a word over variables and symbols. A constraint is a formula , where and (for eedle and aystack; holds if we cannot find within ) are terms and associates every variable with a regular language . An assignment is a function , i.e., it assigns strings to variables. We use to denote the assignment obtained from by substituting the values of variables to respectively. We lift to terms so that for , we let , and for terms , we let . We then say that satisfies , written , if for every and cannot be written as for any , i.e., is not a factor of . We call a variable two-sided if it occurs in both and . Moreover, we use to denote the set of variables occurring in s.t. is a flat language.
Given a term , a variable , and a term , we use to denote the term obtained by substituting every occurrence of the variable in by the term . Moreover, we use to denote the set of variables with at least one occurrence in the term .
Theorem 2 ([19, Theorem 7.5]).
Satisfiability of the constraint is NP-hard.
2.1 Normalization
A variable is flat (non-flat) if the language associated with is flat (non-flat), respectively, and finite if its corresponding language is finite. Moreover, a variable is called decomposed if its language can be represented by a DFA having a single initial, single final state, and containing exactly one nontrivial maximal strongly connected component (SCC) and no other SCCs. We say that is normalized if it contains an occurrence of at least one variable, does not contain any finite variable, and all of its variables are decomposed. Any constraint can be transformed into a disjunction of normalized constraints, as shown by the following lemma.
Lemma 3.
Let . Then can be transformed to an equisatisfiable disjunction of normalized constraints or the formula .
Due to the previous lemma, in the rest of the paper we will focus on solving a single normalized constraint.
In the paper, we will also make use of the following result showing decidability of with only flat variables.
Lemma 4 ([19]).
Satisfiability of where is flat for any is decidable in NExpTime.
Proof sketch.
We can reduce into an equisatisfiable Presburger arithmetic formula based on Parikh images of runs of the NFAs for the variables in . Decidability of follows from decidability of Presburger arithmetic. See [19] for details.
The crucial fact that Lemma 4 depends on is that there is a one-to-one mapping between runs in NFAs of flat languages and their Parikh images; this mapping fundamentally breaks for non-flat languages so one cannot directly extend this technique to the non-flat case.
2.2 Lemmas in Our Toolbox
We introduce fundamental lemmas from the area of combinatorics on words that will be used throughout the rest of the paper. The following lemma will be useful to guarantee the existence of conflicts (i.e., non-matching positions) in sufficiently large overlaps of two words and for some primitive words and large constants . Intuitively, we will control the choice of and , and, thus, guarantee that and cannot be powers of the same word, essentially applying the contraposition of the following lemma.
Lemma 5.
Let be a primitive word, and let and be two words such that . Then the word is primitive.
Proof.
Assume that for some . Then we have and for , such that and . Thus, we have
| (2) |
and so the word is not primitive, a contradiction.
Lemma 6 ([31, Proposition 1.2.1 (Fine and Wilf)]).
Let and be two words. If the words and , for any share a common prefix of the length at least , then and are powers of the same word.
Using Lemmas 5 and 6, we provide the following corollary that shows existence of conflicts between arbitrary overlaps of repetitions of primitive words of a sufficient size.
Corollary 7.
Let and be two words where , with and . Then any overlap between and of the size at least contains a conflict.
A natural approach to showing that an assignment satisfies is to show that cannot be written as for any choice of words and . Therefore, one would have to consider all prefixes , infixes , and corresponding suffixes with and show that implies . Note that the choice of the prefix uniquely determines and , and, therefore, we can only refer to different prefixes when showing . The following lemma reduces the number of prefixes we have to consider if we have information about primitive words that are factors of and .
Lemma 8 ([31, Proposition 12.1.3]).
Let be a primitive word, and let for some words . Then either or , but not both.
We will use the next lemma as a recipe for constructing words for non-flat such that has as a factor a primitive word that is sufficiently long for our proofs.
Lemma 9 ([34]).
Let such that , and are string variables and and are integers such that . Then any solution of the equation has the form , , and for some word and numbers .
We provide the following corollary to give insight into how we use Lemma 9 to construct factors that are primitive words of a suitable length.
Corollary 10.
Given two words and such that for any word it holds that , we have that any word for is primitive.
Proof.
By contradiction. Assume that is not primitive, i.e., for some and . Applying Lemma 9, we see that and for some , which contradicts the assumptions of the corrolary.
2.3 Easy Fragments
Before we establish our main result giving the decidability of the hardest fragment of , we first describe what we consider easy fragments and how to deal with them. We assume a normalized constraint.
-
1.
The formula is solvable by length abstraction. This fragment contains formulae that can be solved easily by making the eedle longer than the aystack. Suppose and where every and is either a string variable or a symbol . We can then create a Presburger arithmetic formula over length variables such that . In the formula, and are either 1 (if ) or the length variable (if ), and is a formula constraining the possible values for the length variables (obtained, e.g., using the Parikh images of the variables’ languages). If is satisfiable, so is the original .
-
2.
All variables are flat. In this case, we can use Lemma 4.
3 Overview
We now move to our main result: deciding a hard instance of . We can classify normalized constraints (cf. Section 2.1) that do not fall in the fragments of Section 2.3 based on the occurrences of non-flat variables as follows:
-
1.
constraints where a non-flat variable occurs both in and and
-
2.
constraints where all (and at least one) non-flat variables occur only in .
Note that the above not included cases of (a) all variables being flat and (b) a non-flat variable being only in are covered in Section 2.3. In particular, if there is a variable that only occurs in , then is infinite due to our normalization. Therefore, such a constraint can be solved by making longer than .
We distinguish the classes (1) and (2) above since for (1), the string substituted for some occurrence of in may overlap with the string for an occurrence of in . We deal with the class (1) by substituting two-sided non-flat variables with fresh symbols. In Section 4, we show that if there is a model of the resulting , we can obtain a model of the original constraint from by assigning to a long-enough word that ensures a mismatch for every overlap of in and in . By doing this, we reduce (1) to either (2) or over flat variables (potentially with no variables at all).
For deciding the class (2), given in detail in Section 6, we construct an equisatisfiable formula that uses flat underapproximations of languages associated with the remaining (as some might have been removed at step (1)) non-flat variables present in . Our result is based on the observation that long words in a non-flat language may have a richer structure compared to long words one can construct using flat languages. Therefore, it is unlikely that a flat variable should have a large conflict-free overlap with a non-flat variable in an assignment that assigns these two variables sufficiently long words. In particular, we prove that the original language of can be underapproximated by a flat language while preserving equisatisfiability. After this step, the resulting constraint can be decided using Lemma 4.
4 Removing Two-Sided Non-Flat Variables
In this section, we will show how to transform a normalized constraint with an occurrence of a two-sided non-flat variable into a constraint without occurrences of the variable . The resulting constraint after removing all two-sided non-flat variables can then be solved either by reduction to Presburger arithmetic (Lemma 4; if no non-flat variables remain in the constraint) or by the procedure in Section 6 (if there are still non-flat variables left in ). The main result of this section is the following theorem.
Theorem 11.
Let be a constraint over the alphabet and let be a non-flat variable. Then the formula with is equisatisfiable to .
The proof of the theorem is given below. It is based on the observation that assigning long words to two-sided variables necessarily causes some occurrences of the same variable to overlap. Since these variables are non-flat, we can construct long words with a rich internal structure that will guarantee that any sufficiently long overlap necessarily contains a conflict.
Before we give the proof, let us formally introduce the concept of words that do not allow conflict-free overlaps of two occurrences of the same word larger than a certain bound.
Definition 12 (-aligned word).
Let be a word and . We say that is -aligned if for all such that , is not a prefix of .
Intuitively, is -aligned if it cannot overlap with itself on a prefix/suffix of the length larger than or equal to (except ). For example, the word is -aligned since for no non-empty word of the length at most it holds that is a prefix of . On the other hand, is not -aligned since for of the length 3, it holds that is a prefix of .
4.1 Proof of Theorem 11
If is satisfiable then so is . To see this, take a model and replace the assignment of to #, producing . Then, there will be conflicts of # and some non-# symbol when checking whether is a model of . Alternatively, it might be possible to align with in a manner such that every # in matches some # in . In such a case, if fails to be a model of we reach a contradiction with being a model .
For the other direction, assume that is satisfiable, which means there is a model of . Next, we will show how to construct a word s.t. is a model of .
Focusing on variable , we can write the two sides of the constraint as and where for each and assuming . Moreover, we write the subscript to distinguish -th occurrence of in . As is non-flat, we have that for some words , and where and are not a power of the same word (Fact 1).
The core of our proof is based on the following observation. Since is a model of , the word is not a factor of . Therefore, given any sufficiently long word , if the extension fails to be a model, then there must be at least one occurrence of the word in partially overlapping with an occurrence of the word in , as shown in the picture below. Thus, if we construct a word that cannot partially overlap with itself, we get that is a model of .
Let and be two words where and . By invoking Corollary 10, we see that both and are primitive.
Note that we also have (because , and from the definition of and we have ). We now use these two primitive words and to construct . Let and let be the word where is the smallest number satisfying with . We set . Let us now give two lemmas establishing the properties of ’s infix .
We constructed the infix of in a way so that it prevents conflict-free overlaps with itself as shown by the following lemma.
Lemma 13.
The word is -aligned.
The full proof of Lemma 13 can be found in [24], but the core of the argument lies in observing that in any overlap of with itself of size at least , there is a factor having an overlap with of size , or, similarly for . Therefore, one can apply Lemma 8 and limit overlaps that must be considered.
The following lemma shows that long overlaps between two occurrences of are unavoidable when has a sufficient length. The in the lemma is used just to position the overlap within and .
Lemma 14.
For each , every occurrence of in has an overlap with some occurrence of in of size at least , where is any symbol in .
Proof.
Let us assume an arbitrary occurrence of in . Since each and are separated by (the same goes for and ), it suffices to consider only the pessimistic case, which is when an occurrence of in matches the longest with and on both sides. The situation is schematically depicted below.
In the figure, and denote the overlaps on both sides. We show that the size of at least one overlap and is greater than by expressing the length using its definition and the schematic above:
| (3) | ||||||
We have that , and, thus, at least one of and is bigger than . Since , we have and hence has an overlap of the required size.
It remains to show that is a model of . For the sake of contradiction, assume that is not a model, meaning that is a factor of . From Lemmas 13 and 14 we have that each occurrence of in is perfectly aligned with some in , which also means that ’s are perfectly aligned. Furthermore, we have that ’s in are aligned with consecutive ’s in , i.e., any is aligned with some for some . If this were not the case and we had overlapping with while there were some matching with for , there would have to be some with with , which is a contradiction with being longer than any by construction. Hence, for , is also a factor of , which is a contradiction to being a model of . Therefore, Theorem 11 holds.
5 -Expansion and Prefix/Suffix Trees
At this point, we are left with a normalized constraint where all variables in are flat. If all variables in are also flat, we can use Lemma 4 and obtain the result. In the rest of the paper, we will deal with the case when contains at least one non-flat variable. Before we give the proof in Section 6, in this section, we introduce two concepts that will be used later: -expansion on non-flat variables and prefix/suffix trees.
5.1 -Expansions on Non-Flat Variables
Intuitively, non-flat languages have words with a rich internal structure compared to flat languages. To illustrate, let be a flat variable with the language for some word and let be a non-flat variable with the language . Furthermore, let and be two sufficiently long words. We inspect the case when and share some long common factor . Since , we have for some , , and . As is non-flat, the run of corresponding to the word passes through states at which one can make a choice of which transition to take next. Since is long, we have to make a lot of “right” choices during the run of in order for achieve the common factor , highlighting the difference between the complexity of and , and suggesting that there is a way to pick to prevent long overlaps with flat variables occurring in .
Guided by this intuition, we introduce a tool called -expansion of a non-flat variable . Given a prefix and a suffix , the -expansion of is the word for a particular such that only a prefix or a suffix of a bounded length can have long overlaps with (sufficiently long) words that belong to a flat language. This tool will play an important role in our proofs. Loosely speaking, if we start with a model and we try to find an alternative model , then the possible reasons why fails to be a model are narrowed down to the choice of and .
In order to define the -expansion, we first need some auxiliary definitions. First, as a resulting of our normalization, the map maps any flat variable to a singleton containing the primitive word that forms the basis of , i.e., such that for some . We lift the definition of to a set of flat variables as , and to a string as .
Second, given a variable with , we define the function to give the lexicographically smallest word such that . Having auxiliary definitions in place, we are ready to define -expansion in the context of the formula with .
Definition 15 (-expansion).
Let be a decomposed non-flat variable and be a DFA s.t. . Moreover, let be a state such that and with for any word . Furthermore, let be some prefix and be a state such that for some . Similarly, let be a suffix and be a state such that for some .
Let for a minimal such that for any . Given , we define the -expansion of to be the word .
Intuitively, -expansion takes a prefix and finds the shortest word that takes the automaton to the state in which we have the freedom to read the words and in any suitable sequence. We loop through in a specific manner so that the resulting factor is primitive thanks to Corollary 10. The situation with the suffix is symmetric. Note that in the following section, we use prefix/suffix variants of the -expansion defined as and .
The following lemma shows that -expansion can be seen almost as introducing a fresh symbol # for the infix connecting and into a word . Intuitively, if we have an assignment that assigns sufficiently long words to all flat variables, then we can find such that any large overlap between and contains a conflict. We use to be the length of the longest literal in , to be the number of states of the largest DFA specifying the language of some variable , and .
Lemma 16.
Let be a decomposed non-flat variable, , and . Further, let be such that and let be an assignment with (i) for any flat variable and (ii) . Every overlap between and of the size at least contains a conflict.
Proof sketch..
It suffices to observe that if the overlap of size at least necessarily contains an overlap between and for some flat variable with . The existence of a conflict follows from Corollary 7. The full proof can be found in [24].
Next, we show that -expansion can be used to facilitate modularity in our proofs, allowing us to search for a suitable prefix and a suitable suffix separately. Searching for and separately requires subtle modifications to , resulting in us searching for and in the context of the modified formulae and , respectively. If we find models of and of a particular form, we compose them into a model of .
Let be a non-flat variable, and let and be two fresh variables with their languages restricted to and . Let and be formulae defined as and where # is a fresh alphabet symbol. Further, let and be two models such that:
-
1.
and agree on the values of variables different than and ,
-
2.
for any flat variable ,
-
3.
and such that for .
Lemma 17.
The assignment is a model of where .
Proof sketch..
The idea behind the proof is that if , then would need to have a large conflict-free overlap with for some flat variable . Applying Corollary 7, we reach a contradiction. The full proof of Lemma 17 can be found in [24].
5.2 Prefix (Suffix) Enumeration through Prefix (Suffix) Trees
Having defined -expansion that acts similarly to inserting a fresh symbol # between a chosen and a suffix of a non-flat variable , we can start enumerating prefixes (or suffixes) up to a certain bound, while searching for a model. We introduce the concept of prefix (suffix) trees that play a major role in our proofs. Below, we give only the definition of a prefix tree; a suffix tree is defined symmetrically.
Definition 18 (Choice state).
Let be a DFA. We say that a state is a choice state if . We write to denote the set of all choice states of .
Definition 19 (Prefix tree).
Let be a variable with its language given by a DFA . We define ’s prefix tree as an (infinite finitely-branching) tree with vertices rooted in such that
-
is a function that labels non-root vertices of with ’s choice states, i.e., for any and ,
-
is a set of labelled edges such that iff there is a run in where for all it holds that .
-
is a function that maps any two vertices connected by an edge to the label on the edge, i.e., iff there exists an edge and is undefined otherwise.
Intuitively, vertices of the tree are labeled by ’s choice states , i.e., states in which we can choose between multiple outgoing transitions along different alphabet symbols. Vertices and are connected by an edge in if is reachable from without passing through any choice state.
A path in is a sequence of vertices where for any . We lift the definition of to paths as .
Definition 20 (Dead-end vertex of a prefix tree).
Let and be the prefix tree for , and let be a partial assignment. A vertex is called a dead end in w.r.t. if where for being the (single) path between and in .
Intuitively, dead-end vertices (and all vertices that are below them in the prefix tree) are not interesting for obtaining a model. Consider, e.g., with and . We have and, thus, the vertex corresponding to the prefix abca is a dead end in w.r.t. since is a factor of .
Definition 21 (-reaching path).
Let be a path in a prefix tree and . We say that is -reaching if .
In our proof, we explore all prefixes of words in a language up to a certain bound . As we have a prefix tree with edges labelled with words of (possibly) different lengths, stating that we have explored all prefixes of the length precisely is problematic. Hence, the concept of -reaching paths is a relaxation allowing paths (prefixes) to slightly vary in length.
6 Underapproximating Non-Flat Variables
In this section, we give the main lemma allowing to underapproximate the language of non-flat variables with a flat language. Throughout this section we use three constants with the following semantics:
-
The constant is the length of prefixes (suffixes) of non-flat variables that we will enumerate in our proofs, searching a model that is shorter w.r.t. some non-flat variable.
-
The constant is the minimal size of words assigned to flat variables occurring in .
-
is used as the value of the parameter in every application of or .
First, let us define parameters of that we use to define the above bounds. Let be the length of the longest string literal in and , let be the largest number of states of a DFA associated with some variable. Furthermore, let be the length of the longest word in the set where is the set of the primitive words used to define the -expansion for every non-flat variable .
Since and depend on the value of , we start by fixing . Intuitively, for any non-flat variable , we set up in a way so that if we consider all prefixes in up to the length , then will contain paths through any state since is a single SCC. After extending these paths up to the length , we can guarantee that will contain all words read from any state of the length at least . Considering all possible words of the length for some readable from a state will be crucial later, as we will show that there can be only a few such words if we fail to find an alternative model s.t. , assuming the existence of a model .
The remaining bounds and are defined as and . Ignoring some technical details and due to reasons that will be revealed shortly, we need to be slightly longer than , so that when we later construct for some particular prefix , we can establish some of the string that precedes an occurrence of in in the case fails to be a model. Finally, is set up so that together with they allow Lemma 17 to be applied, where and play the role of and , respectively.
We remark that the exact values of , , and are not important when reading the proof for the first time. It is sufficient to note that , and that the difference in sizes between these bounds is sufficiently large.
6.1 Overcoming the Infinite by Equivalence with a Finite Index
Our procedure to decide containing non-flat variables in originates in enumeration of partial assignments , since it is easy to find suitable values for non-flat variables when is a literal due to us fixing values of all variables in . The problem is that there is an infinite number of such assignments. Our key observation allowing us prove that we can underapproximate non-flat languages using flat ones is that the precise values of flat variables occurring in does not matter as long as these variables have assigned sufficiently long words. In general, however, a model might assign long words only to a subset of flat variables. Therefore, in our decision procedure, we first guess the set of flat variables that are assigned words shorter than . Since there are finitely many such words, we have a finite number of possible choices of values these variables can attain. We enumerate all possible valuations , and for every such a valuation we produce a new constraint in which we replace every short variable by the word . The regular constraints restricting the remaining flat variables are modified to permit only words longer than , allowing us to assume in our proofs that flat variables in have assigned sufficiently long words.
Let us formalize our observation that the precise length of words assigned to flat variables does not matter as long as they are sufficiently long. Let be a partial assignment. We define the set as . Given a constant , we say that two partial assignments and are -equivalent denoted by iff .
Clearly, has a finite index and if there exists a model of , then its restriction will fall into one of the equivalence classes induced by . Setting , we inspect all equivalence classes, checking whether any of them contains a model. Given a representative of an equivalence class, we replace all variables with if , producing a new constraint . Furthermore, we need to include the fact that the remaining variables in have assigned long words. Therefore, the languages of all variables such that will have their their languages restricted to a new language in . The resulting constraint is clearly equisatisfiable to with models restricted to be -equivalent to .
Some of these instances can be decided without any additional work. In particular, if we have an assignment such that , i.e., all variables occurring in are short, we fix values of all variables in , and, thus, the eedle of is a word. The remaining instances with contain at least one occurrence of a (long) variable in , and, thus, their decidability requires investigation.
6.2 Inspecting the Structure of Non-flat Variables in the Presence of Long Flat Variables
Throughout this section, we fix be a instance resulting from the previous section, i.e., for some equivalence class representative such that . We start by stating the key theorem for our decidability result.
Theorem 22.
Let be a (decomposed) non-flat variable present in . There is a flat language s.t. if there exists a model , then there exists a model s.t. for some word .
Before presenting quite technical lemmas that allowed us to obtain the result, let us derive some intuition on why the theorem holds. Assume that we have a model of and we we pick some long prefix and a long suffix for the variable , and we glue them together using -expansion to produce a word and an altered assignment . The core of the theorem lies in analyzing the situation when fails to be a model. By symmetry, we focus on the case when our choice of the prefix is problematic. We have two possibilities.
-
There is a short prefix of due to which fails to be a model. We address this by systematically exploring the prefix tree of up to a certain bound, marking the vertices that correspond to such prefixes as dead ends.
-
Our choice of does not cause to immediately fail to be a model, however, by applying -expansion we introduce an infix due to which . Since we assume that results from a previous section, we know that all flat variables have assigned a long word, i.e., contains long factors of the form for some which forms the basis of a flat variable and . -expansion glues together a prefix and a suffix using a word where for any base . Therefore, we know that only a limited part of the infix introduced by -expansion is problematic, otherwise we would have a long overlap between and some factor of . Thus, contains a long factor for some and a primitive word that forms the base of a flat variable present in . We carefully analyze the effect of such a factor on the structure of .
We now provide an overview of lemmas that lead to Theorem 22. Since these lemmas are quite technical, we accompany them with intuition and only sketch their proofs. Full proofs can be found in [24]. To simplify the presentation, we focus primarily on attempting to find a suitable prefix of a non-flat variable, and hence, our results are formulated in the context of a modified formula that contains a fresh alphabet symbol #. Since the situation is symmetric for suffixes, we can use the properties of -expansion (Lemma 17) and glue together a suitable prefix and a suitable suffix to produce an altered model.
We start with a technical lemma used frequently in our proofs. The lemma shows that if we know that is a factor of , and we know that a part of in the proximity of the factor is incompatible with , then we can show that a large number of overlaps between and must contain a conflict if contains a large factor of the form .
Lemma 23.
Let and be two primitive words, such that . Let and where and are (possibly empty) words such that and . If 1. the prefix of of the size does not contain the word , 2. and , then is not a factor of .
Proof sketch..
Since contains a long factor and contains at least one , we can apply Lemma 8 to rule out a lot of overlaps that might be conflict-free. All of the remaining overlaps contain a conflict thanks to Condition 2. The full proof is available in [24].
Lemma 24.
Let be a flat variable with , and let be a (decomposed) non-flat variable. Let be formula with being a prefix of and being a non-empty word such that the word is not a prefix of .
If there exists a model with being of the form for some word , , and a suffix of , then is a model of .
Intuitively, the rightmost variable in is the flat variable with . To the right of , there is a literal with the prefix that resembles the flat language . Moreover, also starts with a prefix resembling , followed by . Thus, the prefix of the word mimics the suffix of the right-hand side . Hence, if we look solely on the prefix of and the suffix of , there are no obvious conflicts.
However, , and, therefore, there must be a conflict outside of when considering the above alignment. The rest of the proof can be found in [24].
Next, we derive a lemma formalizing that we can restrict languages of non-flat variables to flat ones, producing an equisatisfiable instance. Stating a symmetric lemma for suffixes, and applying Lemma 17 would give us the entire proof of Theorem 22.
Lemma 25.
Let be the rightmost flat variable with , and let be a non-flat variable occurring in . Let be a formula where , , is a prefix of , and is a non-empty word such that is not a prefix of . There exists a flat language s.t. if there is a model , then for some word .
The intuition behind Lemma 25 is the same as the intuition behind Theorem 22. We note that the lemma requires the word to be non-empty. The case for when has a similar, but simpler proof.
Proof sketch.
We assume the existence of a model , and we systematically explore the prefix tree of the variable in a breadth-first fashion up to the bound , searching for the word . When inspecting any prefix , we check whether is a model of , and if not we mark the last vertex corresponding to as a dead end and we do not explore it further. At the end of the exploration, we inspect the set of all -reaching paths in . If there are no such paths, we know that , and hence, can be found in the finite (flat) language . Alternatively, we check for every path in whether where is the prefix corresponding to . Since, the number of possible -reaching paths is finite, we have that can be found in the flat language in the case that -expansion of some leads to a model of .
Next, it might be that none of the paths in can be -expanded into a model. Let be the rightmost variable in with . Recall that none of the paths in contain a dead-end, and that all variables in are flat, and, thus, they have assigned long words. Combined with the properties of -expansion, we know the reason why fails to be a model, i.e., we almost accurately know the position of in . We show that all paths in share the same prefix of the form for some large and and . Since is non-flat, and is larger than the number of states of , we have opportunities to diverge from the shared prefix in . We show that diverging must immediately lead to a dead-end vertex, and in such a case has a prefix . Hence, we apply Lemma 23 and obtain that is a model of where . Note that depends on an unknown integer , however, the language containing all s for every possible choice of is flat. Alternatively, not-diverging from the path implies that for some , which is again a flat language.
A careful analysis of the proof of Lemma 25 reveals that the lemma, and, therefore, Theorem 22, is not effective in a sense that one cannot directly construct . However, we can obtain a decision procedure at the cost of a producing larger flat language. We construct and all paths up to the bound without having available, losing the ability to mark dead-end vertices. In the resulting flat language we include all words shorter than , and -expansions of all -reaching paths. The remaining parts of that correspond to the situation when no -reaching paths can be -expanded into a model can be computed from without requiring access to the original model . For details, we refer the reader to the full proof of Lemma 25 in [24].
7 Decision Procedure
Finally, we summarize the approach described in previous sections into a decision procedure for . The (nondeterministic) algorithm is shown in Algorithm 1. In the algorithm, for a negated containment and a (partial) assignment , we use to denote the predicate obtained from replacing variables whose assignment is defined with the corresponding assignment.
The set () contains prefixes (suffixes) of words from that might be used to find an alternative model. The procedure glues together prefixes and suffixes, resulting in a language consisting of entire words (not just prefixes or suffixes) from . The procedure partitions the language into such that consists of words resulting from an application of -expansion. Intuitively, the words in are words from whereas are only prefixes that need to be completed into full words from by concatenating suitable suffixes. We decompose in the same way into . The procedure then returns
Theorem 26 (Soundness).
If Algorithm 1 terminates with an assignment , then .
Proof.
Follows from the fact that for every non-flat variable found in .
Theorem 27 (Completeness).
If is satisfiable, then Algorithm 1 terminates with an assignment such that . Otherwise Algorithm 1 terminates with the answer .
Proof.
Correctness for two-sided non-flat variables follows from Theorem 11. For remaining non-flat variables, correctness follows from Lemma 25. Finally, correctness of the procedure follows from Lemma 17.
Theorem 28.
A constraint is decidable in ExpSpace.
Proof sketch..
Decidability follows from the analysis of the decision procedure in Algorithm 1 given above. As for ExpSpace membership, first notice that languages of non-flat variables occurring only in are replaced with flat languages of polynomial size due to the bounds and . Algorithm 1 then uses Lemma 4, bringing the complexity of the procedure to NExpTime. However, obtaining a full model that includes all two-sided non-flat variables brings the procedure to ExpSpace as the length of the word to assigned to a two-sided non-flat variable doubles with each such a variable (cf. Theorem 11).
7.1 Chain-Free Word Equations with
After establishing the decidability of a single predicate, we immediately obtain decidability of string fragments that permit the so-called monadic decomposition [46, 16], i.e., expressing the set of solutions as a finite disjunction of regular membership constraints . These include fragments such as the straight-line fragment [29] or the more expressive chain-free fragment of word equations [4] (note that [4] considers also other predicates). We can therefore easily establish the following theorem.
Theorem 29.
Formula where is a conjunction of chain-free word equations is decidable.
8 Future Work
This paper shows that chain-free word equations with regular constraints and a single instance of the predicate are decidable. There are several possible future work directions. First, we wish to investigate the fragment where the number of constraints is not limited to a single one. Another direction is examining combinations of with other predicates, such as length constraints or disequalities. The technique for combining these from [19] based on the reduction of the constraints to reasoning over Parikh images of finite automata is not directly applicable here. Also, the resulting complexity of our procedure is ExpSpace, however, we have hints that the problem might in fact be solvable in NP.
References
- [1] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Flatten and conquer: A framework for efficient analysis of string constraints. In Albert Cohen and Martin T. Vechev, editors, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 602–617. ACM, 2017. doi:10.1145/3062341.3062384.
- [2] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Trau: SMT solver for string constraints. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1–5. IEEE, 2018. doi:10.23919/FMCAD.2018.8602997.
- [3] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Denghang Hu, Wei-Lun Tsai, Zhillin Wu, and Di-De Yen. Solving not-substring constraint with flat abstraction. In Programming Languages and Systems, pages 305–320, Cham, 2021. Springer International Publishing. doi:10.1007/978-3-030-89051-3_17.
- [4] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bui Phi Diep, Lukáš Holík, and Petr Janků. Chain-free string constraints. In Yu-Fang Chen, Chih-Hong Cheng, and Javier Esparza, editors, Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, October 28-31, 2019, Proceedings, volume 11781 of Lecture Notes in Computer Science, pages 277–293. Springer, 2019. doi:10.1007/978-3-030-31784-3_16.
- [5] C. Aiswarya, Soumodev Mal, and Prakash Saivasan. On the satisfiability of context-free string constraints with subword-ordering. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’22, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533329.
- [6] Rajeev Alur and Pavol Černý. Streaming transducers for algorithmic verification of single-pass list-processing programs. SIGPLAN Not., 46(1):599–610, January 2011. doi:10.1145/1925844.1926454.
- [7] Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. Efficient solving of string constraints for security analysis. In William L. Scherlis and David Brumley, editors, Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016, pages 4–6. ACM, 2016. doi:10.1145/2898375.2898393.
- [8] Murphy Berzish, Joel D. Day, Vijay Ganesh, Mitja Kulczynski, Florin Manea, Federico Mora, and Dirk Nowotka. Towards more efficient methods for solving regular-expression heavy string constraints. Theor. Comput. Sci., 943:50–72, 2023. doi:10.1016/j.tcs.2022.12.009.
- [9] Murphy Berzish, Mitja Kulczynski, Federico Mora, Florin Manea, Joel D. Day, Dirk Nowotka, and Vijay Ganesh. An SMT solver for regular expressions and linear arithmetic over string length. In Alexandra Silva and K. Rustan M. Leino, editors, Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II, volume 12760 of Lecture Notes in Computer Science, pages 289–312. Springer, 2021. doi:10.1007/978-3-030-81688-9_14.
- [10] Berzish, Murphy. Z3str4: A solver for theories over strings. PhD thesis, University of Waterloo, 2021. URL: http://hdl.handle.net/10012/17102.
- [11] Nikolaj S. Bjørner, Nikolai Tillmann, and Andrei Voronkov. Path feasibility analysis for string-manipulating programs. In Stefan Kowalewski and Anna Philippou, editors, Tools and Algorithms for the Construction and Analysis of Systems, 15th International Conference, TACAS 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, volume 5505 of Lecture Notes in Computer Science, pages 307–321. Springer, 2009. doi:10.1007/978-3-642-00768-2_27.
- [12] František Blahoudek, Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Word equations in synergy with regular constraints. In Marsha Chechik, Joost-Pieter Katoen, and Martin Leucker, editors, Formal Methods - 25th International Symposium, FM 2023, Lübeck, Germany, March 6-10, 2023, Proceedings, volume 14000 of Lecture Notes in Computer Science, pages 403–423. Springer, 2023. doi:10.1007/978-3-031-27481-7_23.
- [13] Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. What is decidable about string constraints with the replaceall function. Proc. ACM Program. Lang., 2(POPL):3:1–3:29, 2018. doi:10.1145/3158091.
- [14] Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Solving string constraints with regex-dependent functions through transducers with priorities and variables. Proc. ACM Program. Lang., 6(POPL):1–31, 2022. doi:10.1145/3498707.
- [15] Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. A decision procedure for path feasibility of string manipulating programs with integer data type. In Dang Van Hung and Oleg Sokolsky, editors, Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings, volume 12302 of Lecture Notes in Computer Science, pages 325–342. Springer, 2020. doi:10.1007/978-3-030-59152-6_18.
- [16] Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang., 3(POPL):49:1–49:30, 2019. doi:10.1145/3290362.
- [17] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Solving string constraints with lengths by stabilization. Proceedings of the ACM on Programming Languages, 7(OOPSLA2):2112–2141, 2023. doi:10.1145/3622872.
- [18] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Z3-Noodler: An automata-based string solver. In Bernd Finkbeiner and Laura Kovács, editors, Tools and Algorithms for the Construction and Analysis of Systems - 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part I, volume 14570 of Lecture Notes in Computer Science, pages 24–33. Springer, 2024. doi:10.1007/978-3-031-57246-3_2.
- [19] Yu-Fang Chen, Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. A uniform framework for handling position constraints in string solving. Proc. ACM Program. Lang., 9(PLDI), 2025. doi:10.1145/3729273.
- [20] Joel D. Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, and Danny Bøgsted Poulsen. On solving word equations using SAT. In RP’19, volume 11674 of LNCS, pages 93–106. Springer, 2019. doi:10.1007/978-3-030-30806-3_8.
- [21] Joel D. Day, Vijay Ganesh, Nathan Grewal, and Florin Manea. On the expressive power of string constraints. Proc. ACM Program. Lang., 7(POPL), January 2023. doi:10.1145/3571203.
- [22] Joel D. Day, Vijay Ganesh, Paul He, Florin Manea, and Dirk Nowotka. The satisfiability of extended word equations: The boundary between decidability and undecidability. CoRR, abs/1802.00523, 2018. arXiv:1802.00523.
- [23] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: An efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3_24.
- [24] Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. Negated string containment is decidable (technical report). CoRR, abs/2506.22061, 2025. arXiv:2506.22061.
- [25] Juhani Karhumäki, Filippo Mignosi, and Wojciech Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483–505, 2000. doi:10.1145/337244.337255.
- [26] Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. A DPLL(T) theory solver for a theory of strings and regular expressions. In Armin Biere and Roderick Bloem, editors, Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 646–662. Springer, 2014. doi:10.1007/978-3-319-08867-9_43.
- [27] Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. An efficient SMT solver for string constraints. Formal Methods Syst. Des., 48(3):206–234, 2016. doi:10.1007/s10703-016-0247-6.
- [28] Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark W. Barrett. A decision procedure for regular membership and length constraints over unbounded strings. In Carsten Lutz and Silvio Ranise, editors, Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings, volume 9322 of Lecture Notes in Computer Science, pages 135–150. Springer, 2015. doi:10.1007/978-3-319-24246-0_9.
- [29] Anthony Widjaja Lin and Pablo Barceló. String solving with word equations and transducers: Towards a logic for analysing mutation XSS. In Rastislav Bodík and Rupak Majumdar, editors, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 123–136. ACM, 2016. doi:10.1145/2837614.2837641.
- [30] M. Lothaire, editor. Combinatorics on Words. Cambridge Mathematical Library. Cambridge University Press, 2 edition, 1997.
- [31] M. Lothaire. Algebraic Combinatorics on Words. Cambridge University Press, 2002.
- [32] Kevin Lotz, Amit Goel, Bruno Dutertre, Benjamin Kiesl-Reiter, Soonho Kong, Rupak Majumdar, and Dirk Nowotka. Solving string constraints using sat. In Constantin Enea and Akash Lal, editors, Computer Aided Verification, pages 187–208, Cham, 2023. Springer Nature Switzerland. doi:10.1007/978-3-031-37703-7_9.
- [33] Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel D. Day, Florin Manea, and Vijay Ganesh. Layered and staged Monte Carlo tree search for SMT strategy synthesis. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 1907–1915. ijcai.org, 2024. URL: https://www.ijcai.org/proceedings/2024/211.
- [34] R. C. Lyndon and M. P. Schützenberger. The equation in a free group. Michigan Mathematical Journal, 9(4):289–298, 1962. doi:10.1307/mmj/1028998766.
- [35] G S Makanin. The problem of solvability of equations in a free semigroup. Mathematics of the USSR-Sbornik, 32(2):129, February 1977. doi:10.1070/SM1977v032n02ABEH002376.
- [36] Jakob Nielsen. Die isomorphismen der allgemeinen, unendlichen gruppe mit zwei erzeugenden. Mathematische Annalen, 78(1):385–397, 1917.
- [37] Aina Niemetz, Mathias Preiner, Andrew Reynolds, Clark W. Barrett, and Cesare Tinelli. Syntax-guided quantifier instantiation. In Jan Friso Groote and Kim Guldstrand Larsen, editors, Tools and Algorithms for the Construction and Analysis of Systems - 27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Part II, volume 12652 of Lecture Notes in Computer Science, pages 145–163. Springer, 2021. doi:10.1007/978-3-030-72013-1_8.
- [38] Andres Nötzli, Andrew Reynolds, Haniel Barbosa, Clark W. Barrett, and Cesare Tinelli. Even faster conflicts and lazier reductions for string solvers. In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part II, volume 13372 of Lecture Notes in Computer Science, pages 205–226. Springer, 2022. doi:10.1007/978-3-031-13188-2_11.
- [39] Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. In 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pages 495–500. IEEE Computer Society, 1999. doi:10.1109/SFFCS.1999.814622.
- [40] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. High-level abstractions for simplifying extended string constraints in SMT. In Isil Dillig and Serdar Tasiran, editors, Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part II, volume 11562 of Lecture Notes in Computer Science, pages 23–42. Springer, 2019. doi:10.1007/978-3-030-25543-5_2.
- [41] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. Reductions for strings and regular expressions revisited. In 2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, September 21-24, 2020, pages 225–235. IEEE, 2020. doi:10.34727/2020/isbn.978-3-85448-042-6_30.
- [42] Andrew Reynolds, Maverick Woo, Clark W. Barrett, David Brumley, Tianyi Liang, and Cesare Tinelli. Scaling up DPLL(T) string solvers using context-dependent simplification. In Rupak Majumdar and Viktor Kuncak, editors, Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, volume 10427 of Lecture Notes in Computer Science, pages 453–474. Springer, 2017. doi:10.1007/978-3-319-63390-9_24.
- [43] Neha Rungta. A billion SMT queries a day (invited paper). In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part I, volume 13371 of Lecture Notes in Computer Science, pages 3–18. Springer, 2022. doi:10.1007/978-3-031-13185-1_1.
- [44] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. A symbolic execution framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA, pages 513–528. IEEE Computer Society, 2010. doi:10.1109/SP.2010.38.
- [45] Trauc string constraints benchmark collection, 2020. URL: https://github.com/plfm-iis/trauc_benchmarks.
- [46] Margus Veanes, Nikolaj S. Bjørner, Lev Nachmanson, and Sergey Bereg. Monadic decomposition. J. ACM, 64(2):14:1–14:28, 2017. doi:10.1145/3040488.
