Negated String Containment Is Decidable

Havlena, Vojtěch; Hečko, Michal; Holík, Lukáš; Lengál, Ondřej

doi:10.4230/LIPIcs.MFCS.2025.56

Negated String Containment Is Decidable

Vojtěch Havlena

Brno University of Technology, Czech Republic Michal Hečko

Brno University of Technology, Czech Republic Lukáš Holík

Aalborg University, Denmark
Brno University of Technology, Czech Republic Ondřej Lengál

Brno University of Technology, Czech Republic

Abstract

We provide a positive answer to a long-standing open question of the decidability of the not-contains string predicate. Not-contains is practically relevant, for instance in symbolic execution of string manipulating programs. Particularly, we show that the predicate $\neg\mathrm{Contains}(x_{1}\ldots x_{n},y_{1}\ldots y_{m})$ , where $x_{1}\ldots x_{n}$ and $y_{1}\ldots y_{m}$ are sequences of string variables constrained by regular languages, is decidable. Decidability of a not-contains predicate combined with chain-free word equations and regular membership constraints follows.

Keywords and phrases:

not-contains, string constraints, word combinatorics, primitive word

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Regular languages ; Theory of computation

\rightarrow

Automated reasoning ; Theory of computation

\rightarrow

Logic and verification

Related Version:

Technical Report: http://arxiv.org/abs/2506.22061 [24]

Acknowledgements:

We thank the anonymous reviewers for careful reading of the paper and their suggestions that greatly improved its quality.

Funding:

This work was supported by the Czech Ministry of Education, Youth and Sports ERC.CZ project LL1908, the Czech Science Foundation project 25-18318S, and the FIT BUT internal project FIT-S-23-8151. The work of Michal Hečko, a Brno Ph.D. Talent Scholarship

Holder, is funded by the Brno City Municipality.

DOI:

10.4230/LIPIcs.MFCS.2025.56

Event:

50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Editors:

Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

String constraints have been recently intensely studied in relation to their applications in analysis of string manipulation in programs, e.g., in the analyses of security of web applications or cloud resource access policies [43]. Apart from a plethora of practical solvers, e.g., cvc5 [26, 27, 7, 28, 42, 38, 41], Z3 [23, 11, 33], Ostrich [29, 13, 16, 14, 15], Z3-Noodler [18, 17, 12], Trau [4, 2, 1] Z3Str/2/3/4/3RE [10, 9, 8], Woorpje [20], and nfa2sat [32], the theoretical landscape of string constraints has been intensely studied too. The seminal work of Makanin [35], establishing decidability of word equations, was followed by the work of Plandowski [39] (and later Jeż’s work on recompression) that placed the problem in PSpace. A number of relatively recent works study extensions of string constraints with constraints over string lengths, transducer-defined relational constraints, string-integer conversions, extensions of regular properties, replace-all, etc. As the extended string constraints are in general undecidable, these works focus on finding practically relevant decidable fragments such as the straight-line [16, 29, 13, 15, 14] and chain-free [4, 17] fragments, quadratic equations [36], and others (e.g., [5, 21]).

The most essential constraints, from the practical perspective, are considered to be word equations, regular membership constraints, length constraints, and also $\neg\mathrm{Contains}$ , as argued, e.g., in [44], and as can also be seen in benchmarks, for instance, in [45, 3]. While the three former types of constraints are intensely studied, $\neg\mathrm{Contains}$ was studied only little. Yet, it is important as well as theoretically interesting: besides the occurrence in existing benchmarks, its importance follows also from its ability to capture other highly practical types of constraints. E.g., the $\mathrm{indexOf}(x,y)$ function should return the position of the first occurrence of $y$ in $x$ . It can be converted to the word equation $x=p.y.s$ after which the returned value equals $|p|$ . To ensure that $y$ is indeed the first occurrence in $x$ , there should be no occurrence of $y$ in $p.y^{\prime}$ where $y^{\prime}$ is the prefix of $y$ without the last symbol, i.e., $y^{\prime}.z=y$ for $z\in\Sigma$ . This can be expressed as $\neg\mathrm{Contains}(y,p.y^{\prime})$ (e.g., Z3 solves $\mathrm{indexOf}$ in this way [23]).

As mentioned above, the problem is also interesting from the theoretical perspective. Although the positive version, $\mathrm{Contains}$ , can be easily encoded using word equations, the negation is difficult. Its precise conversion to word equations would require universal quantification, which is undecidable for word equations in general [22]. The most systematic attempts at solving $\neg\mathrm{Contains}$ have been made in [3, 19]. In [3], the authors extend the flattening underapproximating framework behind the solver Trau [2, 1] and give a precise solution for $\neg\mathrm{Contains}$ if all involved string variables are constrained by flat languages (a flat language here stands for a finite union of concatenations of iterations of words) and, moreover, if no string variable appears multiple times, thus avoiding most of the difficulty of the problem. Our recent work [19], on top of which we build here, proceeds in a similar direction and removes the restriction of [3] on multiple occurrences of variables, but still requires all languages to be flat, which is a quite severe restriction. Practical heuristics used in solvers generally solve only easy cases and quickly fail on more complex ones, cf. [19], and do not give any guarantees. E.g., cvc5 translates $\neg\mathrm{Contains}$ into a universally quantified disequality [40], which is in turn handled by cvc5’s incomplete quantifier instantiation [37].

In this paper, we show decidability of a much more general kind of $\neg\mathrm{Contains}$ than [19, 3], namely of the form $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ where $\mathcal{N}$ eedle and $\mathcal{H}$ aystack are string terms (sequences of symbols and variables) and $\Phi_{\mathcal{L}}$ constrains variables by any regular language. The constraint is satisfied by an assignment to string variables respecting $\Phi_{\mathcal{L}}$ under which $\mathcal{N}$ is not a factor (i.e., a continuous subword) of $\mathcal{H}$ (i.e., if $\mathcal{N}$ eedle cannot be found in $\mathcal{H}$ aystack).

Our solution of the problem leads relatively deep into word combinatorics and automata theory. We rely on the result in [19] giving a decision procedure for a flat-language version of the problem. The work [19] uses an automata-based construction inspired by deciding functional equivalence of streaming string transducers [6]. Using a variation on automata Parikh images, it transforms the problem into an equisatisfiable Presburger arithmetic formula (which is decidable). The general case with variables restricted by arbitrary regular languages, the subject of this paper, is solved by a reduction to this flat-language fragment. The core idea of our proof is that we can always find fresh primitive words in non-flat languages that can be repeated an arbitrary number of times. The result of such a repetition is a word that can share with other variables only subwords of a bounded size, assuming all words assigned to variables are sufficiently long. The reduction technically requires a dive into combinatorics on words and results on primitive words [31, 35, 34, 30], which are closely related to flat languages. Our techniques shares traits with the work of Karhumäki et al. [25], which constructs long primitive words to show that disjunctions of word equations can be encoded into a single equation. First, for variables with non-flat languages occurring on both sides of the constraint, we show that we can replace each of them with a single fresh symbol. This is because non-flat languages allow us to choose a sufficiently complex word for the variable $x$ that can be matched only with the value of $x$ on the other side ( $\mathcal{N}$ is the other side of $\mathcal{H}$ and vice versa). For variables with non-flat languages that appear only in $\mathcal{H}$ , we show that after enumerating all possible assignments for them up to a certain bound, their languages can be underapproximated by flat languages while preserving satisfiability.

2 Preliminaries

Numbers.

We use $\mathbb{N}$ for natural numbers (including zero). For $m,n\in\mathbb{N}$ , their greatest common divisor is denoted as $\mathbf{gcd}(m,n)$ and their least common multiple is denoted as $\mathbf{lcm}(m,n)$ .

Words.

An alphabet $\Sigma$ is a finite non-empty set of symbols. Let $\Sigma$ be fixed for the rest of the paper. A (finite) word $w$ over $\Sigma$ is a sequence of symbols $w=a_{1}\ldots a_{n}$ from $\Sigma$ , where $n$ is the length of $w$ , denoted as $|w|$ . The empty word of the length 0 is denoted by $\epsilon$ and a concatenation of two words $u$ and $v$ is denoted as $u\circ v$ (or shortly $u v$ ). An iteration of a word $w$ is defined as $w^{0}\mathrel{\triangleq}\epsilon$ and $w^{i+1}\mathrel{\triangleq}w^{i}\circ w$ for $i\geq 0$ . The set of all words over $\Sigma$ is denoted as $\Sigma^{*}$ . A primitive word cannot be written as $v^{i}$ for any $v$ and $i>1$ , and we will use Greek letters $\alpha,\beta,\gamma,\ldots$ from the beginning of the alphabet to denote primitive words. We denote the set of all primitive words $\mathrm{Prim}$ . A word $u$ is a factor (i.e., a continuous subword) of every word $vuv^{\prime}$ . Given two words $p u s$ and $p^{\prime}u^{\prime}s^{\prime}$ , we say that the factors $u$ and $u^{\prime}$ have an overlap of size $k\in\mathbb{N}$ if $\bigl{|}\{|p|+1,\ldots,|p|+|u|\}\cap\{|p^{\prime}|+1,\ldots,|p^{\prime}|+|u^{% \prime}|\}\bigr{|}=k$ . The overlap of $u$ and $u^{\prime}$ in the words $p u s$ and $p^{\prime}u^{\prime}s^{\prime}$ contains a conflict if there is a position $i$ with $|p|\leq i<|pu|$ and $|p^{\prime}|\leq i<|p^{\prime}u^{\prime}|$ such that the words $p u s$ and $p^{\prime}u^{\prime}s^{\prime}$ contain a different letter at position $i$ .

Languages.

A language $\mathcal{L}$ over $\Sigma$ is a subset of $\Sigma^{*}$ . We will sometimes abuse notation and, given a word $w\in\Sigma^{*}$ , use $w$ to also denote the language $\{w\}$ . For two languages $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ , we use $\mathcal{L}_{1}\circ\mathcal{L}_{2}$ (or just $\mathcal{L}_{1}\mathcal{L}_{2}$ ) for their concatenation $\{uv\mid u\in\mathcal{L}_{1},v\in\mathcal{L}_{2}\}$ . A bounded iteration of a language $\mathcal{L}$ is defined as $\mathcal{L}^{0}\mathrel{\triangleq}\{\epsilon\}$ and $\mathcal{L}^{i+1}\mathrel{\triangleq}\mathcal{L}^{i}\circ\mathcal{L}$ for $i\geq 0$ . The (unbounded) iteration is $\mathcal{L}^{*}\mathrel{\triangleq}\bigcup_{i\geq 0}\mathcal{L}^{i}$ . For a word $w$ we use $\mathrm{Pref}(w)$ ( $\mathrm{Suf}(w)$ ) to denote the set of prefixes (suffixes) of $w$ and $\mathrm{F}(w)$ to denote the set of all factors of $w$ . We lift the definitions to languages as usual. A language $\mathcal{L}\subseteq\Sigma^{*}$ is flat iff it can be expressed as a finite union

\mathcal{L}=\bigcup_{i=1}^{N}w_{i,1}\circ w_{i,2}^{*}\circ w_{i,3}\circ w_{i,4% }^{*}\circ w_{i,5}\circ\cdots\circ w_{i,\ell_{i}-1}^{*}\circ w_{i,\ell_{i}}

(1)

where every $w_{i,j}$ s.t. $1\leq i\leq n,1\leq j\leq\ell_{i}$ is a word over $\Sigma$ , else it is non-flat. Flatness of $\mathcal{L}$ can be characterised by the absence of the so-called “butterfly loops”:

Fact 1.

A regular language $\mathcal{L}\subseteq\Sigma^{*}$ is non-flat iff $p\{u,v\}^{*}s\subseteq\mathcal{L}$ for some $p,s,u,v\in\Sigma^{*}$ with $u,v\not\in w^{*}$ for any word $w\in\Sigma^{*}$ .

Automata.

A (nondeterministic finite) automaton (NFA) over $\Sigma$ is a tuple $\mathcal{A}=(Q,\Delta,I,F)$ where $Q$ is a set of states, $\Delta$ is a set of transitions of the form $q\stackrel{{\scriptstyle a}}{{\rightarrow}}r$ with $q,r\in Q$ and $a\in\Sigma$ , $I\subseteq Q$ is the set of initial states, and $F\subseteq Q$ is the set of final states. A run of $\mathcal{A}$ over a word $w=a_{1}\ldots a_{n}$ from state $q_{0}$ to state $q_{n}$ is a sequence of transitions $q_{0}\stackrel{{\scriptstyle a_{1}}}{{\rightarrow}}q_{1}$ , $q_{1}\stackrel{{\scriptstyle a_{2}}}{{\rightarrow}}q_{2}$ , $\ldots$ , $q_{n-1}\stackrel{{\scriptstyle a_{n}}}{{\rightarrow}}q_{n}$ from $\Delta$ . The empty sequence is a run with $q_{0}=q_{n}$ over $\epsilon$ . We denote by $q_{0}\stackrel{{\scriptstyle w}}{{\leadsto}}_{\mathcal{A}}q_{n}$ that $\mathcal{A}$ has such a run, from where we drop the subscript $\mathcal{A}$ if it is clear from the context. The run is accepting if $q_{0}\in I$ and $q_{n}\in F$ , and the language of $\mathcal{A}$ is $\mathcal{L}(\mathcal{A})\mathrel{\triangleq}\{w\in\Sigma^{*}\mid q\stackrel{{% \scriptstyle w}}{{\leadsto}}r,q\in I,r\in F\}$ . Languages accepted by NFAs are called regular. $\mathcal{A}$ is a deterministic finite automaton (DFA) if $|I|=1$ and for every symbol $a\in\Sigma$ and every pair of transitions $q_{1}\stackrel{{\scriptstyle a}}{{\rightarrow}}r_{1}$ and $q_{2}\stackrel{{\scriptstyle a}}{{\rightarrow}}r_{2}$ in $\Delta$ it holds that if $q_{1}=q_{2}$ then $r_{1}=r_{2}$ .

The $\neg\mathrm{Contains}$ constraint.

Let $\mathbb{X}$ be a set of (string) variables. A term is a word $t\in(\mathbb{X}\cup\Sigma)^{*}$ over variables and symbols. A $\neg\mathrm{Contains}$ constraint is a formula $\varphi\mathrel{\triangleq}\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land% \Phi_{\mathcal{L}}$ , where $\mathcal{N}$ and $\mathcal{H}$ (for $\mathcal{N}$ eedle and $\mathcal{H}$ aystack; $\varphi$ holds if we cannot find $\mathcal{N}$ within $\mathcal{H}$ ) are terms and $\Phi_{\mathcal{L}}\mathrel{\triangleq}\bigwedge_{x\in\mathbb{X}}x\in\mathcal{L% }(x)$ associates every variable $x$ with a regular language $\mathcal{L}_{x}$ . An assignment is a function $\sigma\colon\mathbb{X}\to\Sigma^{*}$ , i.e., it assigns strings to variables. We use $\sigma\mathbin{\triangleleft}\{x_{1}\mapsto w_{1},\ldots,x_{n}\mapsto w_{n}\}$ to denote the assignment obtained from $\sigma$ by substituting the values of variables $x_{1},\ldots,x_{n}$ to $w_{1},\ldots,w_{n}$ respectively. We lift $\sigma$ to terms so that for $a\in\Sigma$ , we let $\sigma(a)\mathrel{\triangleq}a$ , and for terms $t,t^{\prime}$ , we let $\sigma(t\circ t^{\prime})\mathrel{\triangleq}\sigma(t)\circ\sigma(t^{\prime})$ . We then say that $\sigma$ satisfies $\varphi$ , written $\sigma\models\varphi$ , if $\sigma(x)\in\mathcal{L}_{x}$ for every $x\in\mathbb{X}$ and $\sigma(\mathcal{H})$ cannot be written as $u\circ\sigma(\mathcal{N})\circ v$ for any $u,v\in\Sigma^{*}$ , i.e., $\sigma(\mathcal{N})$ is not a factor of $\sigma(\mathcal{H})$ . We call a variable $z$ two-sided if it occurs in both $\mathcal{N}$ and $\mathcal{H}$ . Moreover, we use $\mathbb{X}_{\mathit{Flat}}$ to denote the set of variables $x$ occurring in $\varphi$ s.t. $\mathcal{L}_{x}$ is a flat language.

Given a term $t$ , a variable $x\in\mathbb{X}$ , and a term $t_{s}$ , we use $t[x/t_{s}]$ to denote the term obtained by substituting every occurrence of the variable $x$ in $t$ by the term $t_{s}$ . Moreover, we use $\mathit{Vars}(t)$ to denote the set of variables with at least one occurrence in the term $t$ .

Theorem 2 ([19, Theorem 7.5]).

Satisfiability of the $\neg\mathrm{Contains}$ constraint is NP-hard.

2.1 Normalization

A variable $z$ is flat (non-flat) if the language $\mathcal{L}_{z}$ associated with $z$ is flat (non-flat), respectively, and finite if its corresponding language is finite. Moreover, a variable is called decomposed if its language can be represented by a DFA having a single initial, single final state, and containing exactly one nontrivial maximal strongly connected component (SCC) and no other SCCs. We say that $\varphi$ is normalized if it contains an occurrence of at least one variable, does not contain any finite variable, and all of its variables are decomposed. Any $\neg\mathrm{Contains}$ constraint can be transformed into a disjunction of normalized constraints, as shown by the following lemma.

Lemma 3.

Let $\varphi\mathrel{\triangleq}\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land% \Phi_{\mathcal{L}}$ . Then $\varphi$ can be transformed to an equisatisfiable disjunction $\bigvee_{1\leq i\leq n}\neg\mathrm{Contains}(\mathcal{N}_{i},\mathcal{H}_{i})% \land\Phi_{\mathcal{L}_{i}}$ of normalized constraints or the formula $\mathit{true}$ .

Due to the previous lemma, in the rest of the paper we will focus on solving a single normalized $\neg\mathrm{Contains}$ constraint.

In the paper, we will also make use of the following result showing decidability of $\neg\mathrm{Contains}$ with only flat variables.

Lemma 4 ([19]).

Satisfiability of $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ where $\mathcal{L}_{x}$ is flat for any $x\in\mathbb{X}$ is decidable in NExpTime.

Proof sketch.

We can reduce $\varphi$ into an equisatisfiable Presburger arithmetic formula $\psi$ based on Parikh images of runs of the NFAs for the variables in $\varphi$ . Decidability of $\varphi$ follows from decidability of Presburger arithmetic. See [19] for details. $\hfill\blacktriangleleft$

The crucial fact that Lemma 4 depends on is that there is a one-to-one mapping between runs in NFAs of flat languages and their Parikh images; this mapping fundamentally breaks for non-flat languages so one cannot directly extend this technique to the non-flat case.

2.2 Lemmas in Our Toolbox

We introduce fundamental lemmas from the area of combinatorics on words that will be used throughout the rest of the paper. The following lemma will be useful to guarantee the existence of conflicts (i.e., non-matching positions) in sufficiently large overlaps of two words $\alpha^{M}$ and $\beta^{N}$ for some primitive words $\alpha,\beta\in\Sigma^{*}$ and large constants $M,N\in\mathbb{N}$ . Intuitively, we will control the choice of $\alpha$ and $\beta$ , and, thus, guarantee that $\alpha$ and $\beta$ cannot be powers of the same word, essentially applying the contraposition of the following lemma.

Lemma 5.

Let $\alpha\in\Sigma^{*}$ be a primitive word, and let $p$ and $s$ be two words such that $\alpha=ps$ . Then the word $s p$ is primitive.

Proof.

Assume that $\beta^{k}=sp$ for some $k\geq 2$ . Then we have $s=\beta^{l}u$ and $p=v\beta^{m}$ for $u\in\mathrm{Pref}(\beta)$ , $v\in\mathrm{Suf}(\beta)$ such that $\beta=uv$ and $l+m+1=k$ . Thus, we have

\alpha=v\beta^{m}\beta^{l}u=v(uv)^{m}(uv)^{l}u=(vu)^{l+m+1}

(2)

and so the word $\alpha$ is not primitive, a contradiction. $\hfill\blacktriangleleft$

Lemma 6 ([31, Proposition 1.2.1 (Fine and Wilf)]).

Let $x$ and $y$ be two words. If the words $x^{k}$ and $y^{l}$ , for any $k,l\in\mathbb{N}$ share a common prefix of the length at least $|x|+|y|-\mathbf{gcd}(|x|,|y|)$ , then $x$ and $y$ are powers of the same word.

Using Lemmas 5 and 6, we provide the following corollary that shows existence of conflicts between arbitrary overlaps of repetitions of primitive words of a sufficient size.

Corollary 7.

Let $u=\alpha^{M}$ and $v=\beta^{N}$ be two words where $\alpha,\beta\in\mathrm{Prim}$ , with $|\alpha|\neq|\beta|$ and $M,N\in\mathbb{N}$ . Then any overlap between $u$ and $v$ of the size at least $|\alpha|+|\beta|-\mathbf{gcd}(|\alpha|,|\beta|)$ contains a conflict.

A natural approach to showing that an assignment $\sigma$ satisfies $\varphi$ is to show that $\sigma(\mathcal{H})$ cannot be written as $\sigma(\mathcal{H})=p\circ\sigma(\mathcal{N})\circ s$ for any choice of words $p$ and $s$ . Therefore, one would have to consider all prefixes $p$ , infixes $u$ , and corresponding suffixes $s$ with $|u|=|\sigma(\mathcal{N})|$ and show that $\sigma(\mathcal{H})=pus$ implies $u\neq\sigma(\mathcal{N})$ . Note that the choice of the prefix $p\in\mathrm{Pref}(\sigma(\mathcal{H}))$ uniquely determines $u$ and $s$ , and, therefore, we can only refer to different prefixes when showing $\sigma\models\varphi$ . The following lemma reduces the number of prefixes we have to consider if we have information about primitive words that are factors of $\sigma(\mathcal{N})$ and $\sigma(\mathcal{H})$ .

Lemma 8 ([31, Proposition 12.1.3]).

Let $\alpha\in\Sigma^{*}$ be a primitive word, and let $\alpha^{2}=x\alpha y$ for some words $x,y\in\Sigma^{*}$ . Then either $x=\epsilon$ or $y=\epsilon$ , but not both.

We will use the next lemma as a recipe for constructing words $w_{z}\in\mathcal{L}_{z}$ for non-flat $\mathcal{L}_{z}$ such that $w_{z}$ has as a factor a primitive word that is sufficiently long for our proofs.

Lemma 9 ([34]).

Let $x^{K}=y^{L}z^{M}$ such that $x, y$ , and $z$ are string variables and $K, L$ and $M$ are integers such that $K,L,M\geq 2$ . Then any solution of the equation has the form $x=\alpha^{k}$ , $y=\alpha^{l}$ , and $z=\alpha^{m}$ for some word $\alpha$ and numbers $k,l,m\in\mathbb{N}$ .

We provide the following corollary to give insight into how we use Lemma 9 to construct factors that are primitive words of a suitable length.

Corollary 10.

Given two words $u$ and $v$ such that for any word $w$ it holds that $u,v\not\in w^{*}$ , we have that any word $\alpha=u^{L}v^{M}$ for $L,M\geq 2$ is primitive.

Proof.

By contradiction. Assume that $\alpha$ is not primitive, i.e., $\alpha=t^{K}=u^{L}v^{M}$ for some $t$ and $K,L,M\geq 2$ . Applying Lemma 9, we see that $u=w^{l}$ and $z=w^{m}$ for some $w$ , which contradicts the assumptions of the corrolary. $\hfill\blacktriangleleft$

2.3 Easy Fragments

Before we establish our main result giving the decidability of the hardest fragment of $\neg\mathrm{Contains}$ , we first describe what we consider easy fragments and how to deal with them. We assume a normalized $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ constraint.

1.

The formula is solvable by length abstraction. This fragment contains formulae that can be solved easily by making the $\mathcal{N}$ eedle longer than the $\mathcal{H}$ aystack. Suppose $\mathcal{N}=t_{1}\ldots t_{m}$ and $\mathcal{H}=s_{1}\ldots s_{n}$ where every $t_{i}$ and $s_{j}$ is either a string variable $x\in\mathbb{X}$ or a symbol $a\in\Sigma$ . We can then create a Presburger arithmetic formula $\varphi_{\ell}$ over length variables $\{x_{\ell}\mid x\in\mathbb{X}\}$ such that $\varphi_{\ell}\mathrel{\triangleq}\sum_{1\leq i\leq m}\ell_{i}>\sum_{1\leq j% \leq n}\ell_{j}\land\Psi$ . In the formula, $\ell_{i}$ and $\ell_{j}$ are either 1 (if $t_{i},s_{j}\in\Sigma$ ) or the length variable $x_{\ell}$ (if $t_{i},s_{j}=x$ ), and $\Psi$ is a formula constraining the possible values for the length variables (obtained, e.g., using the Parikh images of the variables’ languages). If $\varphi_{\ell}$ is satisfiable, so is the original $\neg\mathrm{Contains}$ .
2.

All variables are flat. In this case, we can use Lemma 4.

3 Overview

We now move to our main result: deciding a hard instance of $\varphi\mathrel{\triangleq}\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land% \Phi_{\mathcal{L}}$ . We can classify normalized $\neg\mathrm{Contains}$ constraints (cf. Section 2.1) that do not fall in the fragments of Section 2.3 based on the occurrences of non-flat variables as follows:

1.

constraints where a non-flat variable $x$ occurs both in $\mathcal{N}$ and $\mathcal{H}$ and
2.

constraints where all (and at least one) non-flat variables occur only in $\mathcal{H}$ .

Note that the above not included cases of (a) all variables being flat and (b) a non-flat variable being only in $\mathcal{N}$ are covered in Section 2.3. In particular, if there is a variable $x$ that only occurs in $\mathcal{N}$ , then $\mathcal{L}_{x}$ is infinite due to our normalization. Therefore, such a constraint can be solved by making $\mathcal{N}$ longer than $\mathcal{H}$ .

We distinguish the classes (1) and (2) above since for (1), the string substituted for some occurrence of $x$ in $\sigma(\mathcal{H})$ may overlap with the string for an occurrence of $x$ in $\sigma(\mathcal{N})$ . We deal with the class (1) by substituting two-sided non-flat variables $x$ with fresh symbols. In Section 4, we show that if there is a model $\sigma$ of the resulting $\neg\mathrm{Contains}$ , we can obtain a model $\sigma^{\prime}$ of the original constraint $\varphi$ from $\sigma$ by assigning $\sigma^{\prime}(x)$ to a long-enough word that ensures a mismatch for every overlap of $\sigma^{\prime}(x)$ in $\sigma^{\prime}(\mathcal{H})$ and $\sigma^{\prime}(x)$ in $\sigma^{\prime}(\mathcal{N})$ . By doing this, we reduce (1) to either (2) or $\neg\mathrm{Contains}$ over flat variables (potentially with no variables at all).

For deciding the class (2), given in detail in Section 6, we construct an equisatisfiable formula that uses flat underapproximations of languages associated with the remaining (as some might have been removed at step (1)) non-flat variables present in $\mathcal{H}$ . Our result is based on the observation that long words in a non-flat language may have a richer structure compared to long words one can construct using flat languages. Therefore, it is unlikely that a flat variable $z$ should have a large conflict-free overlap with a non-flat variable $x$ in an assignment that assigns these two variables sufficiently long words. In particular, we prove that the original language of $x$ can be underapproximated by a flat language while preserving equisatisfiability. After this step, the resulting constraint can be decided using Lemma 4.

4 Removing Two-Sided Non-Flat Variables

In this section, we will show how to transform a normalized constraint $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ with an occurrence of a two-sided non-flat variable $x$ into a constraint without occurrences of the variable $x$ . The resulting constraint after removing all two-sided non-flat variables can then be solved either by reduction to Presburger arithmetic (Lemma 4; if no non-flat variables remain in the constraint) or by the procedure in Section 6 (if there are still non-flat variables left in $\mathcal{H}$ ). The main result of this section is the following theorem.

Theorem 11.

Let $\varphi\mathrel{\triangleq}\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land% \Phi_{\mathcal{L}}$ be a constraint over the alphabet $\Sigma$ and let $z\in\mathit{Vars}(\mathcal{N})\cap\mathit{Vars}(\mathcal{H})$ be a non-flat variable. Then the formula $\varphi_{\texttt{\#}}\mathrel{\triangleq}\neg\mathrm{Contains}(\mathcal{N}[z/% \texttt{\#}],\mathcal{H}[z/\texttt{\#}])\land\Phi_{\mathcal{L}}$ with $\texttt{\#}\notin\Sigma\cup\mathbb{X}$ is equisatisfiable to $\varphi$ .

The proof of the theorem is given below. It is based on the observation that assigning long words to two-sided variables necessarily causes some occurrences of the same variable to overlap. Since these variables are non-flat, we can construct long words with a rich internal structure that will guarantee that any sufficiently long overlap necessarily contains a conflict.

Before we give the proof, let us formally introduce the concept of words that do not allow conflict-free overlaps of two occurrences of the same word larger than a certain bound.

Definition 12 ( $\ell$ -aligned word).

Let $w$ be a word and $\ell\in\mathbb{N}$ . We say that $w$ is $\ell$ -aligned if for all $p\in\Sigma^{*}$ such that $1\leq|p|\leq|w|-\ell$ , $w$ is not a prefix of $p w$ .

Intuitively, $w$ is $\ell$ -aligned if it cannot overlap with itself on a prefix/suffix of the length larger than or equal to $\ell$ (except $|w|$ ). For example, the word $w=abaa$ is $2$ -aligned since for no non-empty word $p$ of the length at most $|w|-2=2$ it holds that $w$ is a prefix of $p w$ . On the other hand, $w=abaa$ is not $1$ -aligned since for $p=aba$ of the length 3, it holds that $w$ is a prefix of $pw=abaabaa$ .

4.1 Proof of Theorem 11

If $\varphi$ is satisfiable then so is $\varphi_{\texttt{\#}}$ . To see this, take a model $\sigma\models\varphi$ and replace the assignment of $z$ to #, producing $\sigma^{\prime}$ . Then, there will be conflicts of # and some non-# symbol when checking whether $\sigma^{\prime}$ is a model of $\varphi_{\texttt{\#}}$ . Alternatively, it might be possible to align $\sigma^{\prime}(\mathcal{N})$ with $\sigma^{\prime}(\mathcal{H})$ in a manner such that every # in $\sigma^{\prime}(\mathcal{N})$ matches some # in $\sigma^{\prime}(\mathcal{H})$ . In such a case, if $\sigma^{\prime}$ fails to be a model of $\varphi_{\texttt{\#}}$ we reach a contradiction with $\sigma$ being a model $\varphi$ .

For the other direction, assume that $\varphi_{\texttt{\#}}$ is satisfiable, which means there is a model $\sigma^{\prime}$ of $\varphi_{\texttt{\#}}$ . Next, we will show how to construct a word $w_{z}$ s.t. $\sigma=\sigma^{\prime}\cup\{z\mapsto w_{z}\}$ is a model of $\varphi$ .

Focusing on variable $z$ , we can write the two sides of the $\neg\mathrm{Contains}$ constraint as $\mathcal{H}=\mathcal{H}_{0}z_{\mathcal{H},1}\mathcal{H}_{1}\cdots\mathcal{H}_{% n-1}z_{\mathcal{H},n}\mathcal{H}_{n}$ and $\mathcal{N}=\mathcal{N}_{0}z_{\mathcal{N},1}\mathcal{N}_{1}\cdots\mathcal{N}_{% m-1}z_{\mathcal{N},m}\mathcal{N}_{m}$ where $\mathcal{N}_{i},\mathcal{H}_{j}\in(\mathbb{X}^{\prime}\cup\Sigma)^{*}$ for each $i$ and $j$ assuming $\mathbb{X}^{\prime}=\mathbb{X}\setminus\{z\}$ . Moreover, we write the subscript $z_{S,k}$ to distinguish $k$ -th occurrence of $z$ in $S\in\{\mathcal{H},\mathcal{N}\}$ . As $\mathcal{L}_{z}$ is non-flat, we have that $p\{u,v\}^{*}s\subseteq\mathcal{L}_{z}$ for some words $p, u, v$ , and $s$ where $u$ and $v$ are not a power of the same word (Fact 1).

The core of our proof is based on the following observation. Since $\sigma^{\prime}$ is a model of $\varphi_{\texttt{\#}}$ , the word $\sigma^{\prime}(\mathcal{N}[z/\texttt{\#}])$ is not a factor of $\sigma^{\prime}(\mathcal{H}[z/\texttt{\#}])$ . Therefore, given any sufficiently long word $w_{z}\in\mathcal{L}_{z}$ , if the extension $\sigma=\sigma^{\prime}\cup\{z\to w_{z}\}$ fails to be a model, then there must be at least one occurrence of the word $\sigma(z)$ in $\sigma(\mathcal{N})$ partially overlapping with an occurrence of the word $\sigma(z)$ in $\sigma(\mathcal{H})$ , as shown in the picture below. Thus, if we construct a word $w_{z}\in\mathcal{L}_{z}$ that cannot partially overlap with itself, we get $\sigma$ that is a model of $\varphi$ .

Let $\alpha=u^{2}u^{k}v^{2}$ and $\beta=u^{2}v^{l}v^{2}$ be two words where $k=\mathbf{lcm}(|v|,|u|)/|u|$ and $l=\mathbf{lcm}(|v|,|u|)/|v|$ . By invoking Corollary 10, we see that both $\alpha$ and $\beta$ are primitive.

Note that we also have $|\alpha|=|\beta|$ (because $|\alpha|=2|u|+k|u|+2|v|$ , $|\beta|=2|u|+l|v|+2|v|$ and from the definition of $l$ and $k$ we have $k|u|=l|v|$ ). We now use these two primitive words $\alpha$ and $\beta$ to construct $w_{z}$ . Let $\gamma\triangleq\alpha^{r}\beta^{r}\circ\alpha^{r}\beta^{r}\circ\alpha^{2r}% \beta^{2r}$ and let $w_{z}\in\mathcal{L}_{z}$ be the word $w_{z}\triangleq p\gamma s$ where $r\geq 2$ is the smallest number satisfying $r|\alpha|>M+|p|+|s|$ with $M=\max\{|\sigma(\mathcal{H}_{i})|,|\sigma(\mathcal{N}_{j})|:1\leq i\leq n,1% \leq j\leq m\}$ . We set $\sigma=\sigma^{\prime}\cup\{z\mapsto w_{z}\}$ . Let us now give two lemmas establishing the properties of $w_{z}$ ’s infix $\gamma$ .

We constructed the infix $\gamma$ of $w_{z}$ in a way so that it prevents conflict-free overlaps with itself as shown by the following lemma.

Lemma 13.

The word $\gamma$ is $(r+1)|\alpha|$ -aligned.

The full proof of Lemma 13 can be found in [24], but the core of the argument lies in observing that in any overlap of $\gamma$ with itself of size at least $(r+1)|\alpha|$ , there is a factor $\alpha^{2}$ having an overlap with $\alpha$ of size $|\alpha|$ , or, similarly for $\beta$ . Therefore, one can apply Lemma 8 and limit overlaps that must be considered.

The following lemma shows that long overlaps between two occurrences of $\gamma$ are unavoidable when $\gamma$ has a sufficient length. The $a^{i}$ in the lemma is used just to position the overlap within $\sigma(\mathcal{H})$ and $\sigma(\mathcal{N})$ .

Lemma 14.

For each $0\leq i\leq|\sigma(\mathcal{H})|{-}|\sigma(\mathcal{N})|$ , every occurrence of $\gamma$ in $\sigma(\mathcal{H})$ has an overlap with some occurrence of $\gamma$ in $a^{i}\circ\sigma(\mathcal{N})$ of size at least $(r+1)|\alpha|$ , where $a$ is any symbol in $\Sigma$ .

Proof.

Let us assume an arbitrary occurrence of $\gamma$ in $\sigma(\mathcal{H})$ . Since each $\mathcal{H}_{i}$ and $\mathcal{H}_{i+1}$ are separated by $z$ (the same goes for $\mathcal{N}_{i}$ and $\mathcal{N}_{i+1}$ ), it suffices to consider only the pessimistic case, which is when an occurrence of $\gamma$ in $\mathcal{N}$ matches the longest $\sigma(\mathcal{H}_{i})$ with $p$ and $s$ on both sides. The situation is schematically depicted below.

In the figure, $o_{1}$ and $o_{2}$ denote the overlaps on both sides. We show that the size of at least one overlap $o_{1}$ and $o_{2}$ is greater than $(r+1)|\alpha|$ by expressing the length $|\gamma|$ using its definition and the schematic above:

	$\displaystyle r(4\|\beta\|+4\|\alpha\|)$	$\displaystyle=\|o_{1}\|+\|s\|+\|\sigma(\mathcal{H}_{i})\|+\|p\|+\|o_{2}\|$	$\displaystyle{\color[rgb]{0.5,0.5,0.5}\definecolor[named]{pgfstrokecolor}{rgb}% {0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@color@gray@fill{0.5}\lbag% \text{def.\ of $\gamma$}\rbag}$
$\displaystyle\Rightarrow\hskip-28.45274pt$	$\displaystyle 7r\|\alpha\|+r\|\alpha\|$	$\displaystyle\leq\|o_{1}\|+r\|\alpha\|+\|o_{2}\|$	$\displaystyle{\color[rgb]{0.5,0.5,0.5}\definecolor[named]{pgfstrokecolor}{rgb}% {0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@color@gray@fill{0.5}\lbag% \text{since $\|\alpha\|=\|\beta\|$ and def.\ of $r$}\rbag}$	(3)
$\displaystyle\Leftrightarrow\hskip-28.45274pt$	$\displaystyle 7r\|\alpha\|$	$\displaystyle\leq\|o_{1}\|+\|o_{2}\|$

We have that $|o_{1}|+|o_{2}|\geq 7r|\alpha|$ , and, thus, at least one of $|o_{1}|$ and $|o_{2}|$ is bigger than $3r|\alpha|$ . Since $r\geq 2$ , we have $3r|\alpha|\geq(r+1)|\alpha|$ and hence $\gamma$ has an overlap of the required size. $\hfill\blacktriangleleft$

It remains to show that $\sigma$ is a model of $\varphi$ . For the sake of contradiction, assume that $\sigma$ is not a model, meaning that $\sigma(\mathcal{N})$ is a factor of $\sigma(\mathcal{H})$ . From Lemmas 13 and 14 we have that each occurrence of $\gamma$ in $\sigma(\mathcal{N})$ is perfectly aligned with some $\gamma$ in $\sigma(\mathcal{H})$ , which also means that $w_{z}$ ’s are perfectly aligned. Furthermore, we have that $w_{z}$ ’s in $\sigma(\mathcal{N})$ are aligned with consecutive $w_{z}$ ’s in $\sigma(\mathcal{H})$ , i.e., any $\sigma(z_{\mathcal{N},i})$ is aligned with some $\sigma(z_{\mathcal{H},i+k})$ for some $0\leq k\leq n-m$ . If this were not the case and we had $\sigma(z_{\mathcal{N},1})$ overlapping with $\sigma(z_{\mathcal{H},1+k})$ while there were some $\sigma(z_{\mathcal{N},i})$ matching with $\sigma(z_{\mathcal{H},i+k+l})$ for $l\geq 1$ , there would have to be some $\sigma(\mathcal{N}_{j})$ with $1\leq j<i$ with $|\sigma(\mathcal{N}_{j})|>|\sigma(z)|$ , which is a contradiction with $w_{z}$ being longer than any $|\sigma(\mathcal{N}_{j})|$ by construction. Hence, for $\sigma^{\prime\prime}=\sigma\mathbin{\triangleleft}\{z\mapsto\texttt{\#}\}$ , $\sigma^{\prime\prime}(\mathcal{N})$ is also a factor of $\sigma^{\prime\prime}(\mathcal{H})$ , which is a contradiction to $\sigma^{\prime}$ being a model of $\varphi_{\texttt{\#}}$ . Therefore, Theorem 11 holds.

5 $\Gamma$ -Expansion and Prefix/Suffix Trees

At this point, we are left with a normalized $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ constraint where all variables in $\mathcal{N}$ are flat. If all variables in $\mathcal{H}$ are also flat, we can use Lemma 4 and obtain the result. In the rest of the paper, we will deal with the case when $\mathcal{H}$ contains at least one non-flat variable. Before we give the proof in Section 6, in this section, we introduce two concepts that will be used later: $\Gamma$ -expansion on non-flat variables and prefix/suffix trees.

5.1 $\Gamma$ -Expansions on Non-Flat Variables

Intuitively, non-flat languages have words with a rich internal structure compared to flat languages. To illustrate, let $x$ be a flat variable with the language $\mathcal{L}_{x}=\alpha^{*}$ for some word $\alpha$ and let $z$ be a non-flat variable with the language $\mathcal{L}_{z}$ . Furthermore, let $w_{x}\in\mathcal{L}_{x}$ and $w_{z}\in\mathcal{L}_{z}$ be two sufficiently long words. We inspect the case when $w_{x}$ and $w_{z}$ share some long common factor $u$ . Since $w_{x}\in\alpha^{*}$ , we have $u=s\alpha^{k}p$ for some $s\in\mathrm{Suf}(\alpha)$ , $p\in\mathrm{Pref}(\alpha)$ , and $k\in\mathbb{N}$ . As $z$ is non-flat, the run of $\mathcal{A}_{z}$ corresponding to the word $w_{z}$ passes through states at which one can make a choice of which transition to take next. Since $u$ is long, we have to make a lot of “right” choices during the run of $\mathcal{A}_{z}$ in order for achieve the common factor $u$ , highlighting the difference between the complexity of $\mathcal{L}_{x}$ and $\mathcal{L}_{z}$ , and suggesting that there is a way to pick $w_{z}$ to prevent long overlaps with flat variables occurring in $\mathcal{N}$ .

Guided by this intuition, we introduce a tool called $\Gamma_{z}$ -expansion of a non-flat variable $z$ . Given a prefix $p\in\mathrm{Pref}(\mathcal{L}_{z})$ and a suffix $s\in\mathrm{Suf}(\mathcal{L}_{z})$ , the $\Gamma_{z}$ -expansion of $(p,s)$ is the word $\Gamma_{z}(p,s)=pws\in\mathcal{L}_{z}$ for a particular $w$ such that only a prefix or a suffix of a bounded length can have long overlaps with (sufficiently long) words that belong to a flat language. This tool will play an important role in our proofs. Loosely speaking, if we start with a model $\sigma$ and we try to find an alternative model $\sigma^{\prime}=\sigma\mathbin{\triangleleft}\{z\mapsto\Gamma_{z}(p,s)\}$ , then the possible reasons why $\sigma^{\prime}$ fails to be a model are narrowed down to the choice of $p$ and $s$ .

In order to define the $\Gamma_{z}$ -expansion, we first need some auxiliary definitions. First, as a resulting of our normalization, the map $\mathrm{Base}\colon\mathbb{X}_{\mathit{Flat}}\rightarrow 2^{\Sigma^{*}}$ maps any flat variable $x$ to a singleton containing the primitive word $\alpha$ that forms the basis of $\mathcal{L}_{x}$ , i.e., $\mathrm{Base}(x)\triangleq\{\alpha\}$ such that $\mathcal{L}_{x}=(\alpha^{k})^{*}$ for some $k\in\mathbb{N}$ . We lift the definition of $\mathrm{Base}$ to a set $X$ of flat variables as $\mathrm{Base}(X)\triangleq\bigcup_{x\in X}\mathrm{Base}(x)$ , and to a string $s\in(\Sigma\cup\mathbb{X}_{\mathit{Flat}})^{*}$ as $\mathrm{Base}(s)\triangleq\mathrm{Base}(\mathit{Vars}(s)\cap\mathbb{X}_{% \mathit{Flat}})$ .

Second, given a variable $z\in\mathbb{X}$ with $\mathcal{A}_{z}=(Q,\Sigma,\Delta,I,F)$ , we define the function $\mathrm{con}_{z}\colon Q\times Q\rightarrow\Sigma^{*}$ to give the lexicographically smallest word $\mathrm{con}_{z}(q,s)\triangleq w$ such that $q\stackrel{{\scriptstyle w}}{{\leadsto}}_{\mathcal{A}}s$ . Having auxiliary definitions in place, we are ready to define $\Gamma$ -expansion in the context of the formula $\varphi=\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ with $\mathcal{N}\in(\mathbb{X}_{\mathit{Flat}}\cup\Sigma)^{*}$ .

Definition 15 ( $\Gamma$ -expansion).

Let $z\in\mathit{Vars}(\mathcal{H})$ be a decomposed non-flat variable and $\mathcal{A}_{z}=(Q,\Sigma,\Delta,I,F)$ be a DFA s.t. $\mathcal{L}(\mathcal{A}_{z})=\mathcal{L}_{z}$ . Moreover, let $q_{u|v}\in Q$ be a state such that $q_{u|v}\stackrel{{\scriptstyle u}}{{\leadsto}}q_{u|v}$ and $q_{u|v}\stackrel{{\scriptstyle v}}{{\leadsto}}q_{u|v}$ with $u,v\not\in w^{*}$ for any word $w$ . Furthermore, let $p\in\mathrm{Pref}(\mathcal{L}_{z})$ be some prefix and $q_{p}$ be a state such that $q_{0}\stackrel{{\scriptstyle p}}{{\leadsto}}q_{p}$ for some $q_{0}\in I$ . Similarly, let $s\in\mathrm{Suf}(\mathcal{L}_{z})$ be a suffix and $q_{s}$ be a state such that $q_{s}\stackrel{{\scriptstyle s}}{{\leadsto}}q_{f}$ for some $q_{f}\in F$ .

Let $\gamma_{z}\triangleq u^{2+k}v^{2}$ for a minimal $k\in\mathbb{N}$ such that $\gamma_{z}>|\alpha|$ for any $\alpha\in\mathrm{Base}(\mathcal{N})$ . Given $K\in\mathbb{N}$ , we define the $\Gamma^{K}_{z}$ -expansion of $(p,s)$ to be the word $\Gamma^{K}_{z}(p,s)\triangleq p\circ\mathrm{con}(q_{p},q_{u|v})\circ\gamma_{z}% ^{K}\circ\mathrm{con}(q_{u|v},q_{s})\circ s$ .

Intuitively, $\Gamma_{z}$ -expansion takes a prefix $p$ and finds the shortest word $\mathrm{con}(q_{p},q_{u|v})$ that takes the automaton to the state $q_{u|v}$ in which we have the freedom to read the words $u$ and $v$ in any suitable sequence. We loop through $q_{u|v}$ in a specific manner so that the resulting factor $\gamma_{z}$ is primitive thanks to Corollary 10. The situation with the suffix is symmetric. Note that in the following section, we use prefix/suffix variants of the $\Gamma$ -expansion defined as $\Gamma_{\mathrm{Pref}(z)}^{K}\triangleq p\circ\mathrm{con}(q_{p},q_{u|v})\circ% \gamma_{z}^{K}$ and $\Gamma_{\mathrm{Suf}(z)}^{K}\triangleq\gamma_{z}^{K}\circ\mathrm{con}(q_{u|v},% q_{s})\circ s$ .

The following lemma shows that $\Gamma_{z}$ -expansion can be seen almost as introducing a fresh symbol # for the infix $w$ connecting $p$ and $s$ into a word $pws\in\mathcal{L}_{z}$ . Intuitively, if we have an assignment $\sigma$ that assigns sufficiently long words to all flat variables, then we can find $K\in\mathbb{N}$ such that any large overlap between $\sigma(\mathcal{N})$ and $\sigma(z)=\Gamma^{K}_{z}(p,s)$ contains a conflict. We use $M_{\mathrm{Lit}}$ to be the length of the longest literal in $\varphi$ , $M_{Q}$ to be the number of states of the largest DFA specifying the language of some variable $x\in\mathbb{X}$ , and $M_{\alpha}\triangleq\max\{|\alpha|:\alpha\in\mathrm{Base}(\mathcal{N})\cup\{% \gamma_{z}\}\}$ .

Lemma 16.

Let $z\in\mathbb{X}$ be a decomposed non-flat variable, $p\in\mathrm{Pref}(\mathcal{L}_{z})$ , and $s\in\mathrm{Suf}(\mathcal{L}_{z})$ . Further, let $K\in\mathbb{N}$ be such that $K|\gamma_{z}|\geq 4M_{\alpha}+2M_{\mathrm{Lit}}$ and let $\sigma$ be an assignment with (i) $|\sigma(x)|\geq 2M_{\alpha}$ for any flat variable $x$ and (ii) $\sigma(z)=\Gamma^{K}_{z}(p,s)$ . Every overlap between $\sigma(z)$ and $\sigma(\mathcal{N})$ of the size at least $\max(|p|,|s|)+M_{Q}+2M_{\mathrm{Lit}}+2M_{\alpha}$ contains a conflict.

Proof sketch..

It suffices to observe that if the overlap of size at least $N$ necessarily contains an overlap between $\gamma^{K}_{z}$ and $\sigma(x)=\alpha^{l}$ for some flat variable $x$ with $\alpha\in\mathrm{Base}(x)$ . The existence of a conflict follows from Corollary 7. The full proof can be found in [24]. $\hfill\blacktriangleleft$

Next, we show that $\Gamma_{z}$ -expansion can be used to facilitate modularity in our proofs, allowing us to search for a suitable prefix $p$ and a suitable suffix $s$ separately. Searching for $p$ and $s$ separately requires subtle modifications to $\varphi$ , resulting in us searching for $p$ and $s$ in the context of the modified formulae $\varphi_{\mathrm{Pref}}$ and $\varphi_{\mathrm{Suf}}$ , respectively. If we find models of $\varphi_{\mathrm{Pref}}$ and $\varphi_{\mathrm{Suf}}$ of a particular form, we compose them into a model of $\varphi$ .

Let $z\in\mathit{Vars}(\mathcal{H})$ be a non-flat variable, and let $z_{\mathrm{Pref}}$ and $z_{\mathrm{Suf}}$ be two fresh variables with their languages restricted to $z_{\mathrm{Pref}}\in\mathrm{Pref}(\mathcal{L}_{z})$ and $z_{\mathrm{Suf}}\in\mathrm{Suf}(\mathcal{L}_{z})$ . Let $\varphi_{\mathrm{Pref}}$ and $\varphi_{\mathrm{Suf}}$ be formulae defined as $\varphi_{\mathrm{Pref}}\triangleq\varphi[z/z_{\mathrm{Pref}}\circ\texttt{\#}]$ and $\varphi_{\mathrm{Suf}}\triangleq\varphi[z/\texttt{\#}\circ z_{\mathrm{Suf}}]$ where # is a fresh alphabet symbol. Further, let $\sigma^{\mathrm{Pref}}\models\varphi_{\mathrm{Pref}}$ and $\sigma^{\mathrm{Suf}}\models\varphi_{\mathrm{Suf}}$ be two models such that:

1.

$\sigma^{\mathrm{Pref}}$ and $\sigma^{\mathrm{Suf}}$ agree on the values of variables different than $z_{\mathrm{Pref}}$ and $z_{\mathrm{Suf}}$ ,
2.

$|\sigma^{\mathrm{Pref}}(x)|>2M_{\alpha}\land|\sigma^{\mathrm{Suf}}(x)|>2M_{\alpha}$ for any flat variable $x\in\mathbb{X}_{\mathit{Flat}}$ ,
3.

$\sigma^{\mathrm{Pref}}(z_{\mathrm{Pref}})=p\gamma_{z}^{K}$ and $\sigma^{\mathrm{Suf}}(z_{\mathrm{Suf}})=\gamma_{z}^{L}s$ such that $n|\gamma_{z}|\geq 4M_{\alpha}+2M_{\mathrm{Lit}}$ for $n\in\{K,L\}$ .

Lemma 17.

The assignment $\sigma^{\mathrm{Pref}}\mathbin{\triangleleft}\{z\mapsto p\gamma_{z}^{K}s\}$ is a model of $\varphi$ where $K=\min(K,L)$ .

Proof sketch..

The idea behind the proof is that if $\sigma\not\models\varphi$ , then $\sigma(z)$ would need to have a large conflict-free overlap with $\sigma(x)$ for some flat variable $x\in\mathbb{X}$ . Applying Corollary 7, we reach a contradiction. The full proof of Lemma 17 can be found in [24]. $\hfill\blacktriangleleft$

5.2 Prefix (Suffix) Enumeration through Prefix (Suffix) Trees

Having defined $\Gamma^{K}$ -expansion that acts similarly to inserting a fresh symbol # between a chosen $p\in\mathrm{Pref}(\mathcal{L}_{z})$ and a suffix $s\in\mathrm{Suf}(\mathcal{L}_{z})$ of a non-flat variable $z\in\mathit{Vars}(\mathcal{H})$ , we can start enumerating prefixes $p\in\mathrm{Pref}(\mathcal{L}_{z})$ (or suffixes) up to a certain bound, while searching for a model. We introduce the concept of prefix (suffix) trees that play a major role in our proofs. Below, we give only the definition of a prefix tree; a suffix tree is defined symmetrically.

Definition 18 (Choice state).

Let $\mathcal{A}=(Q,\Delta,I,F)$ be a DFA. We say that a state $q\in Q$ is a choice state if $\big{|}\{(q,a,r)\in\Delta:a\in\Sigma,r\in Q\}\big{|}>1$ . We write $C(\mathcal{A})$ to denote the set of all choice states of $\mathcal{A}$ .

Definition 19 (Prefix tree).

Let $z\in\mathbb{X}$ be a variable with its language $\mathcal{L}_{z}$ given by a DFA $\mathcal{A}_{z}=(Q_{z},\Delta_{z},\{q_{0}\},F_{z})$ . We define $z$ ’s prefix tree $T_{z}=(V_{z},E_{z},r_{z},\mathrm{st}_{z},\mathcal{W}_{z})$ as an (infinite finitely-branching) tree with vertices $V_{z}$ rooted in $r_{z}\in V_{z}$ such that

$\blacksquare$

$\mathrm{st}_{z}\colon V_{z}\rightarrow Q_{z}$ is a function that labels non-root vertices of $T_{z}$ with $\mathcal{A}_{z}$ ’s choice states, i.e., $\mathrm{st}_{z}(v)\in C(\mathcal{A}_{z})$ for any $v\neq r_{z}$ and $\mathrm{st}(r_{z})=q_{0}$ ,
$\blacksquare$

$E_{z}\subseteq V_{z}\times\Sigma^{+}\times V_{z}$ is a set of labelled edges such that $(v,a_{1}\ldots a_{n},v^{\prime})\in E_{z}$ iff there is a run $\mathrm{st}_{z}(v)\stackrel{{\scriptstyle a_{1}}}{{\rightarrow}}q_{1}\stackrel% {{\scriptstyle a_{2}}}{{\rightarrow}}\cdots\stackrel{{\scriptstyle a_{n}}}{{% \rightarrow}}\mathrm{st}_{z}(v^{\prime})$ in $\mathcal{A}_{z}$ where for all $0<i<n$ it holds that $q_{i}\notin C(\mathcal{A})$ .
$\blacksquare$

$\mathcal{W}_{z}\colon V_{z}\times V_{z}\rightharpoonup\Sigma^{*}$ is a function that maps any two vertices connected by an edge to the label on the edge, i.e., $\mathcal{W}_{z}(v,v^{\prime})=w$ iff there exists an edge $(v,w,v^{\prime})\in E_{z}$ and is undefined otherwise.

Intuitively, vertices of the tree are labeled by $\mathcal{A}_{z}$ ’s choice states $C(\mathcal{A}_{z})$ , i.e., states in which we can choose between multiple outgoing transitions along different alphabet symbols. Vertices $s$ and $s^{\prime}$ are connected by an edge in $T_{z}$ if $\mathrm{st}(s^{\prime})$ is reachable from $\mathrm{st}(s)$ without passing through any choice state.

A path $\pi$ in $T_{z}$ is a sequence of vertices $\pi=s_{0}\dots s_{n}$ where $(s_{i},w_{i},s_{i+1})\in E_{z}$ for any $0\leq i<n$ . We lift the definition of $\mathcal{W}$ to paths as $\mathcal{W}(s_{0}\ldots s_{n})\triangleq\mathcal{W}\big{(}(s_{0},s_{1})\big{)}% \circ\cdots\circ\mathcal{W}\big{(}(s_{n-1},s_{n})\big{)}$ .

Definition 20 (Dead-end vertex of a prefix tree).

Let $\varphi=\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ and $T_{z}=(V,E,v_{0},\mathrm{st},\mathcal{W})$ be the prefix tree for $z\in\mathbb{X}$ , and let $\sigma\colon(\mathbb{X}\setminus\{z\})\rightarrow\Sigma^{*}$ be a partial assignment. A vertex $v_{n}\in V$ is called a dead end in $T_{z}$ w.r.t. $\sigma$ if $\sigma^{\prime}\not\models\varphi[z/z\texttt{\#}]$ where $\sigma^{\prime}\mathrel{\triangleq}\sigma\mathbin{\triangleleft}\{z\mapsto% \mathcal{W}(v_{0}\ldots v_{n})\}$ for $v_{0}\ldots v_{n}$ being the (single) path between $v_{0}$ and $v_{n}$ in $T_{z}$ .

Intuitively, dead-end vertices (and all vertices that are below them in the prefix tree) are not interesting for obtaining a $\neg\mathrm{Contains}$ model. Consider, e.g., $\varphi\triangleq\neg\mathrm{Contains}(\texttt{ab}x,xz)\land\Phi_{\mathcal{L}}$ with $\mathcal{L}_{x}=(\texttt{ab})^{+}$ and $\mathcal{L}_{z}=(\texttt{a}\{\texttt{b},\texttt{c}\}\texttt{c})^{*}$ . We have $\varphi[z/z\texttt{\#}]=\neg\mathrm{Contains}(\mathcal{N}^{\prime},\mathcal{H}% ^{\prime})=\neg\mathrm{Contains}(\texttt{ab}x,xz\texttt{\#})$ and, thus, the vertex $v\in V_{z}$ corresponding to the prefix abca is a dead end in $T_{z}$ w.r.t. $\sigma=\{x\mapsto\texttt{ab}\}$ since $\sigma^{\prime}(\mathcal{N}^{\prime})=\texttt{abab}$ is a factor of $\sigma^{\prime}(\mathcal{H}^{\prime})=\texttt{ababca}\texttt{\#}$ .

Definition 21 ( $H$ -reaching path).

Let $\pi=v_{0}\dots v_{n}$ be a path in a prefix tree $T_{z}=(V,E,v_{0},\mathrm{st},\mathcal{W})$ and $H\in\mathbb{N}$ . We say that $\pi$ is $H$ -reaching if $|\mathcal{W}(v_{0}\ldots v_{n})|\geq H\geq|\mathcal{W}(v_{0}\ldots v_{n-1})|$ .

In our proof, we explore all prefixes of words in a language up to a certain bound $H$ . As we have a prefix tree with edges labelled with words of (possibly) different lengths, stating that we have explored all prefixes of the length precisely $H$ is problematic. Hence, the concept of $H$ -reaching paths is a relaxation allowing paths (prefixes) to slightly vary in length.

6 Underapproximating Non-Flat Variables

In this section, we give the main lemma allowing to underapproximate the language of non-flat variables with a flat language. Throughout this section we use three constants $\lambda_{\mathrm{Flat}},\lambda_{\mathrm{Q}},\lambda_{\kappa}\in\mathbb{N}$ with the following semantics:

$\blacksquare$

The constant $\lambda_{\mathrm{Q}}$ is the length of prefixes (suffixes) of non-flat variables that we will enumerate in our proofs, searching a model that is shorter w.r.t. some non-flat variable.
$\blacksquare$

The constant $\lambda_{\mathrm{Flat}}$ is the minimal size of words assigned to flat variables occurring in $\mathcal{N}$ .
$\blacksquare$

$\lambda_{\kappa}$ is used as the value of the parameter $K$ in every application of $\Gamma_{\mathrm{Pref}(z)}^{K}$ or $\Gamma_{\mathrm{Suf}(z)}^{K}$ .

First, let us define parameters of $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})$ that we use to define the above bounds. Let $M_{\mathrm{Lit}}$ be the length of the longest string literal in $\mathcal{N}$ and $\mathcal{H}$ , let $M_{Q}$ be the largest number of states of a DFA associated with some variable. Furthermore, let $M_{\alpha}$ be the length of the longest word in the set $W_{\alpha}(\mathcal{N})\cup W_{\gamma}$ where $W_{\gamma}$ is the set of the primitive words $\gamma_{z}$ used to define the $\Gamma_{z}$ -expansion for every non-flat variable $z\in\mathit{Vars}(\mathcal{H})$ .

Since $\lambda_{\mathrm{Flat}}$ and $\lambda_{\kappa}$ depend on the value of $\lambda_{\mathrm{Q}}$ , we start by fixing $\lambda_{\mathrm{Q}}\triangleq 2M_{\alpha}M_{Q}+M_{\mathrm{Lit}}$ . Intuitively, for any non-flat variable $z\in\mathbb{X}$ , we set up $\lambda_{\mathrm{Q}}$ in a way so that if we consider all prefixes in $T_{z}$ up to the length $M_{\alpha}M_{Q}$ , then $T_{z}$ will contain paths through any state $q\in Q_{z}$ since $\mathcal{A}_{z}$ is a single SCC. After extending these paths up to the length $\lambda_{\mathrm{Q}}$ , we can guarantee that $T_{z}$ will contain all words read from any state $q$ of the length at least $M_{\alpha}$ . Considering all possible words of the length $|\beta|$ for some $\beta\in\mathrm{Base}(\mathcal{N})$ readable from a state will be crucial later, as we will show that there can be only a few such words if we fail to find an alternative model $\sigma^{\prime}\triangleq\sigma\mathbin{\triangleleft}\{z\mapsto w_{z}\}$ s.t. $|\sigma^{\prime}(z)|<|\sigma(z)|$ , assuming the existence of a model $\sigma$ .

The remaining bounds $\lambda_{\mathrm{Flat}}$ and $\lambda_{\kappa}$ are defined as $\lambda_{\mathrm{Flat}}\triangleq\lambda_{\mathrm{Q}}+4M_{\alpha}+M_{Q}$ and $\lambda_{\kappa}\triangleq\lambda_{\mathrm{Flat}}+2M_{\alpha}+2M_{\mathrm{Lit}}$ . Ignoring some technical details and due to reasons that will be revealed shortly, we need $\lambda_{\mathrm{Flat}}$ to be slightly longer than $\lambda_{\mathrm{Q}}$ , so that when we later construct $\sigma^{\prime}\triangleq\sigma\mathbin{\triangleleft}\{z\mapsto p\}$ for some particular prefix $|p|\leq\lambda_{\mathrm{Q}}+M_{Q}$ , we can establish some of the string that precedes an occurrence of $z$ in $\sigma_{|\mathbb{X}\setminus\{z\}}(\mathcal{H})$ in the case $\sigma^{\prime}$ fails to be a model. Finally, $\lambda_{\kappa}$ is set up so that together with $\lambda_{\mathrm{Flat}}$ they allow Lemma 17 to be applied, where $\lambda_{\mathrm{Q}}$ and $\lambda_{\mathrm{Flat}}$ play the role of $K_{0}$ and $N_{0}$ , respectively.

We remark that the exact values of $\lambda_{\mathrm{Q}}$ , $\lambda_{\mathrm{Flat}}$ , and $\lambda_{\kappa}$ are not important when reading the proof for the first time. It is sufficient to note that $\lambda_{\mathrm{Q}}<\lambda_{\mathrm{Flat}}<\lambda_{\kappa}$ , and that the difference in sizes between these bounds is sufficiently large.

6.1 Overcoming the Infinite by Equivalence with a Finite Index

Our procedure to decide $\varphi=\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})$ containing non-flat variables in $\mathcal{H}$ originates in enumeration of partial assignments $\eta\colon\mathit{Vars}(\mathcal{N})\rightarrow\Sigma^{*}$ , since it is easy to find suitable values for non-flat variables when $\mathcal{N}$ is a literal due to us fixing values of all variables in $\mathcal{N}$ . The problem is that there is an infinite number of such assignments. Our key observation allowing us prove that we can underapproximate non-flat languages using flat ones is that the precise values of flat variables occurring in $\mathcal{N}$ does not matter as long as these variables have assigned sufficiently long words. In general, however, a model $\sigma\models\varphi$ might assign long words only to a subset of flat variables. Therefore, in our decision procedure, we first guess the set $X$ of flat variables that are assigned words shorter than $\lambda_{\mathrm{Flat}}$ . Since there are finitely many such words, we have a finite number of possible choices $\tau\colon X\rightarrow\Sigma^{*}$ of values these variables can attain. We enumerate all possible valuations $\tau$ , and for every such a valuation $\tau$ we produce a new constraint $\varphi_{\tau}$ in which we replace every short variable $x\in X$ by the word $\tau(x)$ . The regular constraints restricting the remaining flat variables are modified to permit only words longer than $\lambda_{\mathrm{Flat}}$ , allowing us to assume in our proofs that flat variables in $\varphi_{\tau}$ have assigned sufficiently long words.

Let us formalize our observation that the precise length of words assigned to flat variables does not matter as long as they are sufficiently long. Let $\eta\colon\mathit{Vars}(\mathcal{N})\rightarrow\Sigma^{*}$ be a partial assignment. We define the set $X_{<\lambda}$ as $X_{<\lambda}(\eta)\triangleq\big{\{}x\in\mathit{Vars}(\mathcal{N}):|\eta(x)|<% \lambda\big{\}}$ . Given a constant $\lambda\in\mathbb{N}$ , we say that two partial assignments $\eta$ and $\vartheta$ are $\lambda$ -equivalent denoted by $\eta\sim_{\lambda}\vartheta$ iff $\eta\sim_{\lambda}\vartheta\stackrel{{\scriptstyle\mathit{def}}}{{% \Leftrightarrow}}\eta_{|X_{<\lambda}(\eta)}=\vartheta_{|X_{<\lambda}(\vartheta)}$ .

Clearly, $\sim_{\lambda}$ has a finite index and if there exists a model $\sigma$ of $\varphi$ , then its restriction $\sigma_{|\mathit{Vars}(\mathcal{N})}$ will fall into one of the equivalence classes induced by $\sim_{\lambda}$ . Setting $\lambda=\lambda_{\mathrm{Flat}}$ , we inspect all equivalence classes, checking whether any of them contains a model. Given a representative $\eta$ of an equivalence class, we replace all variables $x\in\mathit{Vars}(\mathcal{N})$ with $\eta(x)$ if $|\eta(x)|<\lambda_{\mathrm{Flat}}$ , producing a new constraint $\varphi_{\eta}$ . Furthermore, we need to include the fact that the remaining variables in $\mathcal{N}$ have assigned long words. Therefore, the languages of all variables $y\in\mathit{Vars}(\mathcal{N})$ such that $|\eta(y)|\geq\lambda_{\mathrm{Flat}}$ will have their their languages restricted to a new language $\mathcal{L}^{\prime}_{y}=\mathcal{L}_{y}\cap\{|w|\geq\lambda_{\mathrm{Flat}}% \mid w\in\Sigma^{*}\}$ in $\varphi_{\eta}$ . The resulting constraint $\varphi_{\eta}$ is clearly equisatisfiable to $\varphi$ with models restricted to be $\sim_{\lambda_{\mathrm{Flat}}}$ -equivalent to $\eta$ .

Some of these instances can be decided without any additional work. In particular, if we have an assignment $\eta$ such that $X_{<\lambda_{\mathrm{Flat}}}(\eta)=\mathit{Vars}(\mathcal{N})$ , i.e., all variables occurring in $\mathcal{N}$ are short, we fix values of all variables in $\mathcal{N}$ , and, thus, the $\mathcal{N}$ eedle of $\varphi_{\eta}$ is a word. The remaining instances with $X_{<\lambda_{\mathrm{Flat}}}(\tau)\subset\mathit{Vars}(\mathcal{N})$ contain at least one occurrence of a (long) variable in $\tau(\mathcal{N})$ , and, thus, their decidability requires investigation.

6.2 Inspecting the Structure of Non-flat Variables in the Presence of Long Flat Variables

Throughout this section, we fix $\varphi$ be a $\neg\mathrm{Contains}$ instance resulting from the previous section, i.e., $\varphi\triangleq\varphi_{\eta}=\neg\mathrm{Contains}(\mathcal{N},\mathcal{H}% \land\Phi_{\mathcal{L},\eta}$ for some equivalence class representative $\eta$ such that $\mathit{Vars}(\mathcal{N})\neq\emptyset$ . We start by stating the key theorem for our decidability result.

Theorem 22.

Let $z$ be a (decomposed) non-flat variable present in $\mathcal{H}$ . There is a flat language $\mathcal{L}^{\mathrm{Flat}}_{z}\subset\mathcal{L}_{z}$ s.t. if there exists a model $\sigma\models\varphi$ , then there exists a model $\sigma^{\prime}\models\varphi[z/z\texttt{\#}]$ s.t. $\sigma^{\prime}\triangleq\sigma\mathbin{\triangleleft}\{z\mapsto w_{z}\}$ for some word $w_{z}\in\mathcal{L}^{\mathrm{Flat}}_{z}$ .

Before presenting quite technical lemmas that allowed us to obtain the result, let us derive some intuition on why the theorem holds. Assume that we have a model $\sigma$ of $\varphi$ and we we pick some long prefix $p$ and a long suffix $s$ for the variable $z$ , and we glue them together using $\Gamma_{z}$ -expansion to produce a word $w\triangleq\Gamma^{\lambda_{\kappa}}_{z}(p,s)$ and an altered assignment $\sigma^{\prime}\triangleq\{z\mapsto w\}$ . The core of the theorem lies in analyzing the situation when $\sigma^{\prime}$ fails to be a model. By symmetry, we focus on the case when our choice of the prefix $p$ is problematic. We have two possibilities.

$\blacksquare$

There is a short prefix of $p$ due to which $\sigma^{\prime}$ fails to be a model. We address this by systematically exploring the prefix tree of $z$ up to a certain bound, marking the vertices that correspond to such prefixes as dead ends.
$\blacksquare$

Our choice of $p$ does not cause $\sigma^{\prime}$ to immediately fail to be a model, however, by applying $\Gamma_{z}$ -expansion we introduce an infix due to which $\sigma^{\prime}\not\models\varphi$ . Since we assume that $\varphi$ results from a previous section, we know that all flat variables have assigned a long word, i.e., $\sigma^{\prime}(\mathcal{N})$ contains long factors of the form $\alpha^{k}$ for some $\alpha$ which forms the basis of a flat variable $x\in\mathit{Vars}(\mathcal{N})$ and $k\in\mathbb{N}$ . $\Gamma_{z}$ -expansion glues together a prefix and a suffix using a word $\gamma^{K}_{z}$ where $|\gamma_{z}|\neq|\alpha|$ for any base $\alpha$ . Therefore, we know that only a limited part of the infix introduced by $\Gamma_{z}$ -expansion is problematic, otherwise we would have a long overlap between $\gamma^{K}_{z}$ and some factor $\alpha^{k}$ of $\sigma^{\prime}(\mathcal{N})$ . Thus, $p$ contains a long factor $\alpha^{k}$ for some $k\geq 1$ and a primitive $\alpha$ word $\alpha$ that forms the base of a flat variable present in $\mathcal{N}$ . We carefully analyze the effect of such a factor on the structure of $\mathcal{A}_{z}$ .

We now provide an overview of lemmas that lead to Theorem 22. Since these lemmas are quite technical, we accompany them with intuition and only sketch their proofs. Full proofs can be found in [24]. To simplify the presentation, we focus primarily on attempting to find a suitable prefix of a non-flat variable, and hence, our results are formulated in the context of a modified formula that contains a fresh alphabet symbol #. Since the situation is symmetric for suffixes, we can use the properties of $\Gamma_{z}$ -expansion (Lemma 17) and glue together a suitable prefix and a suitable suffix to produce an altered model.

We start with a technical lemma used frequently in our proofs. The lemma shows that if we know that $\alpha$ is a factor of $\mathcal{H}$ , and we know that a part of $\mathcal{H}$ in the proximity of the factor $\alpha$ is incompatible with $\alpha$ , then we can show that a large number of overlaps between $\sigma(\mathcal{N})$ and $\sigma(\mathcal{H})$ must contain a conflict if $\mathcal{N}$ contains a large factor of the form $\alpha^{N}$ .

Lemma 23.

Let $\alpha$ and $\gamma$ be two primitive words, such that $|\alpha|\neq|\gamma|$ . Let $\mathcal{H}=t\alpha u\gamma^{\lambda_{\kappa}}$ and $\mathcal{N}=v\alpha^{N}w$ where $t, u, v$ and $w$ are (possibly empty) words such that $N|\alpha|>\lambda_{\mathrm{Flat}}$ and $u<\lambda_{\mathrm{Flat}}-2\max(|\alpha|,|\gamma|)$ . If 1. the prefix $p$ of $\mathcal{H}$ of the size $|p|=|t|+|\alpha|+|w|$ does not contain the word $\mathcal{N}$ , 2. $\alpha\not\in\mathrm{Pref}(v_{0})$ and $v_{0}\not\in\mathrm{Pref}(\alpha)$ , then $\mathcal{N}$ is not a factor of $\mathcal{H}$ .

Proof sketch..

Since $\mathcal{N}$ contains a long factor $\alpha^{N}$ and $\mathcal{H}$ contains at least one $\alpha$ , we can apply Lemma 8 to rule out a lot of overlaps that might be conflict-free. All of the remaining overlaps contain a conflict thanks to Condition 2. The full proof is available in [24]. $\hfill\blacktriangleleft$

Lemma 24.

Let $x\in\mathbb{X}$ be a flat variable with $\mathrm{Base}(x)=\{\alpha\}$ , and let $z\in\mathit{Vars}(\mathcal{H})$ be a (decomposed) non-flat variable. Let $\varphi$ $\varphi\triangleq\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})=\neg\mathrm{% Contains}(\mathcal{N}^{\prime}x\alpha^{M}pW,\mathcal{H})$ be formula with $p\neq\alpha$ being a prefix of $\alpha$ and $W=a_{0}\dots a_{n}$ being a non-empty word such that the word $pa_{0}$ is not a prefix of $\alpha$ .

If there exists a model $\sigma$ with $\sigma(z)$ being of the form $\sigma(z)=s\alpha^{k}pWV$ for some word $V$ , $k\geq 1$ , and a suffix $s$ of $\alpha$ , then $\sigma\mathbin{\triangleleft}\big{\{}z\mapsto\Gamma_{\mathrm{Pref}(z)}^{% \lambda_{\kappa}}(s\alpha^{k}pW)\big{\}}$ is a model of $\varphi[z/z\texttt{\#}]$ .

Intuitively, the rightmost variable in $\mathcal{N}$ is the flat variable $x$ with $\mathrm{Base}(x)=\{\alpha\}$ . To the right of $x$ , there is a literal with the prefix $\alpha^{M}p$ that resembles the flat language $\mathcal{L}_{x}$ . Moreover, $\sigma(z)$ also starts with a prefix $s\alpha^{k}p$ resembling $\mathcal{L}_{x}$ , followed by $W$ . Thus, the prefix $s\alpha^{k}pW$ of the word $\sigma(z)$ mimics the suffix of the right-hand side $\sigma(\mathcal{N})$ . Hence, if we look solely on the prefix of $\sigma(z)$ and the suffix of $\sigma(\mathcal{N})$ , there are no obvious conflicts.

However, $\sigma\models\varphi$ , and, therefore, there must be a conflict outside of $z$ when considering the above alignment. The rest of the proof can be found in [24].

Next, we derive a lemma formalizing that we can restrict languages of non-flat variables to flat ones, producing an equisatisfiable instance. Stating a symmetric lemma for suffixes, and applying Lemma 17 would give us the entire proof of Theorem 22.

Lemma 25.

Let $x$ be the rightmost flat variable with $\mathrm{Base}(x)=\{\alpha\}$ , and let $z$ be a non-flat variable occurring in $\mathcal{H}$ . Let $\varphi$ be a formula $\varphi\triangleq\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{% \mathcal{L}}=\neg\mathrm{Contains}(\mathcal{N}^{\prime}x\alpha^{M}pW,\mathcal{% H})\land\Phi_{\mathcal{L}}$ where $\mathcal{N}^{\prime}\in(\mathbb{X}\cup\Sigma)^{*}$ , $M\geq 0$ , $p\neq\alpha$ is a prefix of $\alpha$ , and $W=a_{0}\dots a_{n}$ is a non-empty word such that $pa_{0}$ is not a prefix of $\alpha$ . There exists a flat language $\mathcal{L}^{\mathrm{Flat}}_{z}\subset\mathcal{L}_{z}$ s.t. if there is a model $\sigma\models\varphi$ , then $\sigma\mathbin{\triangleleft}\{z\mapsto w_{z}\}\models\varphi[z/z\texttt{\#}]$ for some word $w_{z}\in\mathcal{L}^{\mathrm{Flat}}_{z}$ .

The intuition behind Lemma 25 is the same as the intuition behind Theorem 22. We note that the lemma requires the word $W$ to be non-empty. The case for when $W=\varepsilon$ has a similar, but simpler proof.

Proof sketch.

We assume the existence of a model $\sigma\models\varphi$ , and we systematically explore the prefix tree $T_{z}$ of the variable $z$ in a breadth-first fashion up to the bound $\lambda_{\mathrm{Q}}$ , searching for the word $w_{z}$ . When inspecting any prefix $u$ , we check whether $\sigma\mathbin{\triangleleft}\{z\mapsto u\}$ is a model of $\varphi[z/z\texttt{\#}]$ , and if not we mark the last vertex corresponding to $u$ as a dead end and we do not explore it further. At the end of the exploration, we inspect the set $\mathcal{P}_{\geq\lambda_{\mathrm{Q}}}$ of all $\lambda_{\mathrm{Q}}$ -reaching paths in $T_{z}$ . If there are no such paths, we know that $|\sigma(z)|<\lambda_{\mathrm{Q}}$ , and hence, $w_{z}$ can be found in the finite (flat) language $\{w\in\mathcal{L}_{z}\mid|w|<\lambda_{\mathrm{Q}}\}$ . Alternatively, we check for every path $\pi$ in $\mathcal{P}_{\geq\lambda_{\mathrm{Q}}}$ whether $\sigma\mathbin{\triangleleft}\{z\mapsto\Gamma_{\mathrm{Pref}(z)}^{\lambda_{% \kappa}}(u_{\pi})\}\models\varphi[z/z\texttt{\#}]$ where $u_{\pi}$ is the prefix corresponding to $\pi$ . Since, the number of possible $\lambda_{\mathrm{Q}}$ -reaching paths is finite, we have that $w_{z}$ can be found in the flat language $\{\Gamma_{\mathrm{Pref}(z)}^{\lambda_{\kappa}}(p_{\pi})\mid\pi\in\mathcal{P}_{% \geq\lambda_{\mathrm{Q}}}\}$ in the case that $\Gamma_{\mathrm{Pref}(z)}$ -expansion of some $u_{\pi}$ leads to a model of $\varphi[z/z\texttt{\#}]$ .

Next, it might be that none of the paths in $\mathcal{P}_{\geq\lambda_{\mathrm{Q}}}$ can be $\Gamma_{\mathrm{Pref}(z)}$ -expanded into a model. Let $x$ be the rightmost variable in $\mathcal{N}$ with $\mathrm{Base}(x)=\{\alpha\}$ . Recall that none of the paths in $\mathcal{P}_{\geq\lambda_{\mathrm{Q}}}$ contain a dead-end, and that all variables in $\mathcal{N}$ are flat, and, thus, they have assigned long words. Combined with the properties of $\Gamma_{\mathrm{Pref}(z)}$ -expansion, we know the reason why $\sigma^{\prime}\triangleq\sigma\mathbin{\triangleleft}\{z\mapsto\Gamma_{% \mathrm{Pref}(z)}^{\lambda_{\kappa}}(p_{\pi})\}$ fails to be a model, i.e., we almost accurately know the position of $\sigma^{\prime}(\mathcal{N})$ in $\sigma^{\prime}(\mathcal{H})$ . We show that all paths in $\mathcal{P}_{\geq\lambda_{\mathrm{Q}}}$ share the same prefix of the form $s\alpha^{M}p$ for some large $k\in\mathbb{N}$ and $s\in\mathrm{Suf}(\alpha)$ and $p\in\mathrm{Pref}(\alpha)$ . Since $z$ is non-flat, and $\lambda_{\mathrm{Q}}$ is larger than the number of states of $\mathcal{A}_{z}$ , we have opportunities to diverge from the shared prefix $s\alpha^{M}p$ in $T_{z}$ . We show that diverging must immediately lead to a dead-end vertex, and in such a case $\sigma(z)$ has a prefix $s\alpha^{M}pW$ . Hence, we apply Lemma 23 and obtain that $\sigma\mathbin{\triangleleft}\{z\mapsto w_{z,M}\}$ is a model of $\varphi[z/z\texttt{\#}]$ where $w_{z,M}=\Gamma_{\mathrm{Pref}(z)}^{\lambda_{\kappa}}(s\alpha^{M}pW)$ . Note that $w_{z,M}$ depends on an unknown integer $M$ , however, the language containing all $w_{z,M}$ s for every possible choice of $k$ is flat. Alternatively, not-diverging from the path implies that $\sigma(z)\in s\alpha^{*}p^{\prime}$ for some $p^{\prime}\in\mathrm{Pref}(\alpha)$ , which is again a flat language. $\hfill\blacktriangleleft$

A careful analysis of the proof of Lemma 25 reveals that the lemma, and, therefore, Theorem 22, is not effective in a sense that one cannot directly construct $\mathcal{L}^{\mathrm{Flat}}_{z}$ . However, we can obtain a decision procedure at the cost of a producing larger flat language. We construct $T_{z}$ and all paths up to the bound $\lambda_{\mathrm{Q}}$ without having $\sigma$ available, losing the ability to mark dead-end vertices. In the resulting flat language $\mathcal{L}^{\mathrm{Flat}}_{z}$ we include all words shorter than $\lambda_{\mathrm{Q}}$ , and $\Gamma_{\mathrm{Pref}(z)}$ -expansions of all $\lambda_{\mathrm{Q}}$ -reaching paths. The remaining parts of $\mathcal{L}^{\mathrm{Flat}}_{z}$ that correspond to the situation when no $\lambda_{\mathrm{Q}}$ -reaching paths can be $\Gamma_{\mathrm{Pref}(z)}$ -expanded into a model can be computed from $\mathcal{A}_{z}$ without requiring access to the original model $\sigma$ . For details, we refer the reader to the full proof of Lemma 25 in [24].

7 Decision Procedure

Finally, we summarize the approach described in previous sections into a decision procedure for $\neg\mathrm{Contains}$ . The (nondeterministic) algorithm is shown in Algorithm 1. In the algorithm, for a negated containment $\varphi$ and a (partial) assignment $\sigma$ , we use $\sigma(\varphi)$ to denote the $\neg\mathrm{Contains}$ predicate obtained from $\varphi$ replacing variables whose assignment is defined with the corresponding assignment.

Algorithm 1 Decision Procedure for

\neg\mathrm{Contains}

.

The set $\mathcal{L}^{\mathrm{Pref}}_{x}$ ( $\mathcal{L}^{\mathrm{Suf}}_{x}$ ) contains prefixes (suffixes) of words from $\mathcal{L}_{z}$ that might be used to find an alternative model. The $\mathsf{glue}(\mathcal{L}^{\mathrm{Pref}}_{x},\mathcal{L}^{\mathrm{Suf}}_{x})$ procedure glues together prefixes and suffixes, resulting in a language consisting of entire words (not just prefixes or suffixes) from $\mathcal{L}_{z}$ . The procedure partitions the language $\mathcal{L}^{\mathrm{Pref}}_{x}$ into $\mathcal{L}^{\mathrm{Pref}}_{x}=\mathrm{P}_{w}\cup\mathrm{P}_{\mathrm{inc}}$ such that $\mathrm{P}_{\mathrm{inc}}$ consists of words resulting from an application of $\Gamma_{\mathrm{Pref}(z)}$ -expansion. Intuitively, the words in $\mathrm{P}_{w}$ are words from $\mathcal{L}_{x}$ whereas $\mathrm{P}_{\mathrm{inc}}$ are only prefixes that need to be completed into full words from $\mathcal{L}_{x}$ by concatenating suitable suffixes. We decompose $\mathcal{L}^{\mathrm{Suf}}_{x}$ in the same way into $\mathcal{L}^{\mathrm{Suf}}_{x}=\mathrm{S}_{w}\cup\mathrm{S}_{\mathrm{inc}}$ . The procedure then returns $\mathcal{L}^{\prime}_{x}=\mathrm{P}_{w}\cup\mathrm{S}_{w}\cup\{p\gamma^{% \lambda_{\kappa}}s\mid(p\gamma^{\lambda_{\kappa}},\gamma^{\lambda_{\kappa}}s)% \in\mathrm{P}_{\mathrm{inc}}\times\mathrm{S}_{\mathrm{inc}}\}.$

Theorem 26 (Soundness).

If Algorithm 1 terminates with an assignment $\sigma$ , then $\sigma\models\varphi$ .

Proof.

Follows from the fact that $\mathcal{L}^{\prime}_{x}\subseteq\mathcal{L}_{x}$ for every non-flat variable $x$ found in $\mathcal{H}^{\prime}$ . $\hfill\blacktriangleleft$

Theorem 27 (Completeness).

If $\varphi$ is satisfiable, then Algorithm 1 terminates with an assignment $\sigma$ such that $\sigma\models\varphi$ . Otherwise Algorithm 1 terminates with the answer $\mathsf{UNSAT}$ .

Proof.

Correctness for two-sided non-flat variables follows from Theorem 11. For remaining non-flat variables, correctness follows from Lemma 25. Finally, correctness of the $\mathsf{glue}$ procedure follows from Lemma 17. $\hfill\blacktriangleleft$

Theorem 28.

A constraint $\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ is decidable in ExpSpace.

Proof sketch..

Decidability follows from the analysis of the decision procedure in Algorithm 1 given above. As for ExpSpace membership, first notice that languages of non-flat variables occurring only in $\mathcal{H}$ are replaced with flat languages of polynomial size due to the bounds $\lambda_{\mathrm{Q}}$ and $\lambda_{\kappa}$ . Algorithm 1 then uses Lemma 4, bringing the complexity of the procedure to NExpTime. However, obtaining a full model that includes all two-sided non-flat variables brings the procedure to ExpSpace as the length of the word to assigned to a two-sided non-flat variable doubles with each such a variable (cf. Theorem 11). $\hfill\blacktriangleleft$

7.1 Chain-Free Word Equations with $\neg\mathrm{Contains}$

After establishing the decidability of a single $\neg\mathrm{Contains}$ predicate, we immediately obtain decidability of string fragments that permit the so-called monadic decomposition [46, 16], i.e., expressing the set of solutions as a finite disjunction of regular membership constraints $\bigwedge_{x\in\mathbb{X}}x\in\mathcal{L}_{x}$ . These include fragments such as the straight-line fragment [29] or the more expressive chain-free fragment of word equations [4] (note that [4] considers also other predicates). We can therefore easily establish the following theorem.

Theorem 29.

Formula $W\land\neg\mathrm{Contains}(\mathcal{N},\mathcal{H})\land\Phi_{\mathcal{L}}$ where $W$ is a conjunction of chain-free word equations is decidable.

8 Future Work

This paper shows that chain-free word equations with regular constraints and a single instance of the $\neg\mathrm{Contains}$ predicate are decidable. There are several possible future work directions. First, we wish to investigate the fragment where the number of $\neg\mathrm{Contains}$ constraints is not limited to a single one. Another direction is examining combinations of $\neg\mathrm{Contains}$ with other predicates, such as length constraints or disequalities. The technique for combining these from [19] based on the reduction of the constraints to reasoning over Parikh images of finite automata is not directly applicable here. Also, the resulting complexity of our procedure is ExpSpace, however, we have hints that the problem might in fact be solvable in NP.

References

[1] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Flatten and conquer: A framework for efficient analysis of string constraints. In Albert Cohen and Martin T. Vechev, editors, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 602–617. ACM, 2017. doi:10.1145/3062341.3062384.
[2] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Trau: SMT solver for string constraints. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1–5. IEEE, 2018. doi:10.23919/FMCAD.2018.8602997.
[3] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Denghang Hu, Wei-Lun Tsai, Zhillin Wu, and Di-De Yen. Solving not-substring constraint with flat abstraction. In Programming Languages and Systems, pages 305–320, Cham, 2021. Springer International Publishing. doi:10.1007/978-3-030-89051-3_17.
[4] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bui Phi Diep, Lukáš Holík, and Petr Janků. Chain-free string constraints. In Yu-Fang Chen, Chih-Hong Cheng, and Javier Esparza, editors, Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, October 28-31, 2019, Proceedings, volume 11781 of Lecture Notes in Computer Science, pages 277–293. Springer, 2019. doi:10.1007/978-3-030-31784-3_16.
[5] C. Aiswarya, Soumodev Mal, and Prakash Saivasan. On the satisfiability of context-free string constraints with subword-ordering. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’22, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533329.
[6] Rajeev Alur and Pavol Černý. Streaming transducers for algorithmic verification of single-pass list-processing programs. SIGPLAN Not., 46(1):599–610, January 2011. doi:10.1145/1925844.1926454.
[7] Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. Efficient solving of string constraints for security analysis. In William L. Scherlis and David Brumley, editors, Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016, pages 4–6. ACM, 2016. doi:10.1145/2898375.2898393.
[8] Murphy Berzish, Joel D. Day, Vijay Ganesh, Mitja Kulczynski, Florin Manea, Federico Mora, and Dirk Nowotka. Towards more efficient methods for solving regular-expression heavy string constraints. Theor. Comput. Sci., 943:50–72, 2023. doi:10.1016/j.tcs.2022.12.009.
[9] Murphy Berzish, Mitja Kulczynski, Federico Mora, Florin Manea, Joel D. Day, Dirk Nowotka, and Vijay Ganesh. An SMT solver for regular expressions and linear arithmetic over string length. In Alexandra Silva and K. Rustan M. Leino, editors, Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II, volume 12760 of Lecture Notes in Computer Science, pages 289–312. Springer, 2021. doi:10.1007/978-3-030-81688-9_14.
[10] Berzish, Murphy. Z3str4: A solver for theories over strings. PhD thesis, University of Waterloo, 2021. URL: http://hdl.handle.net/10012/17102.
[11] Nikolaj S. Bjørner, Nikolai Tillmann, and Andrei Voronkov. Path feasibility analysis for string-manipulating programs. In Stefan Kowalewski and Anna Philippou, editors, Tools and Algorithms for the Construction and Analysis of Systems, 15th International Conference, TACAS 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, volume 5505 of Lecture Notes in Computer Science, pages 307–321. Springer, 2009. doi:10.1007/978-3-642-00768-2_27.
[12] František Blahoudek, Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Word equations in synergy with regular constraints. In Marsha Chechik, Joost-Pieter Katoen, and Martin Leucker, editors, Formal Methods - 25th International Symposium, FM 2023, Lübeck, Germany, March 6-10, 2023, Proceedings, volume 14000 of Lecture Notes in Computer Science, pages 403–423. Springer, 2023. doi:10.1007/978-3-031-27481-7_23.
[13] Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. What is decidable about string constraints with the replaceall function. Proc. ACM Program. Lang., 2(POPL):3:1–3:29, 2018. doi:10.1145/3158091.
[14] Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Solving string constraints with regex-dependent functions through transducers with priorities and variables. Proc. ACM Program. Lang., 6(POPL):1–31, 2022. doi:10.1145/3498707.
[15] Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. A decision procedure for path feasibility of string manipulating programs with integer data type. In Dang Van Hung and Oleg Sokolsky, editors, Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings, volume 12302 of Lecture Notes in Computer Science, pages 325–342. Springer, 2020. doi:10.1007/978-3-030-59152-6_18.
[16] Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang., 3(POPL):49:1–49:30, 2019. doi:10.1145/3290362.
[17] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Solving string constraints with lengths by stabilization. Proceedings of the ACM on Programming Languages, 7(OOPSLA2):2112–2141, 2023. doi:10.1145/3622872.
[18] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Z3-Noodler: An automata-based string solver. In Bernd Finkbeiner and Laura Kovács, editors, Tools and Algorithms for the Construction and Analysis of Systems - 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part I, volume 14570 of Lecture Notes in Computer Science, pages 24–33. Springer, 2024. doi:10.1007/978-3-031-57246-3_2.
[19] Yu-Fang Chen, Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. A uniform framework for handling position constraints in string solving. Proc. ACM Program. Lang., 9(PLDI), 2025. doi:10.1145/3729273.
[20] Joel D. Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, and Danny Bøgsted Poulsen. On solving word equations using SAT. In RP’19, volume 11674 of LNCS, pages 93–106. Springer, 2019. doi:10.1007/978-3-030-30806-3_8.
[21] Joel D. Day, Vijay Ganesh, Nathan Grewal, and Florin Manea. On the expressive power of string constraints. Proc. ACM Program. Lang., 7(POPL), January 2023. doi:10.1145/3571203.
[22] Joel D. Day, Vijay Ganesh, Paul He, Florin Manea, and Dirk Nowotka. The satisfiability of extended word equations: The boundary between decidability and undecidability. CoRR, abs/1802.00523, 2018. arXiv:1802.00523.
[23] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: An efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3_24.
[24] Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. Negated string containment is decidable (technical report). CoRR, abs/2506.22061, 2025. arXiv:2506.22061.
[25] Juhani Karhumäki, Filippo Mignosi, and Wojciech Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483–505, 2000. doi:10.1145/337244.337255.
[26] Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. A DPLL(T) theory solver for a theory of strings and regular expressions. In Armin Biere and Roderick Bloem, editors, Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 646–662. Springer, 2014. doi:10.1007/978-3-319-08867-9_43.
[27] Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. An efficient SMT solver for string constraints. Formal Methods Syst. Des., 48(3):206–234, 2016. doi:10.1007/s10703-016-0247-6.
[28] Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark W. Barrett. A decision procedure for regular membership and length constraints over unbounded strings. In Carsten Lutz and Silvio Ranise, editors, Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings, volume 9322 of Lecture Notes in Computer Science, pages 135–150. Springer, 2015. doi:10.1007/978-3-319-24246-0_9.
[29] Anthony Widjaja Lin and Pablo Barceló. String solving with word equations and transducers: Towards a logic for analysing mutation XSS. In Rastislav Bodík and Rupak Majumdar, editors, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 123–136. ACM, 2016. doi:10.1145/2837614.2837641.
[30] M. Lothaire, editor. Combinatorics on Words. Cambridge Mathematical Library. Cambridge University Press, 2 edition, 1997.
[31] M. Lothaire. Algebraic Combinatorics on Words. Cambridge University Press, 2002.
[32] Kevin Lotz, Amit Goel, Bruno Dutertre, Benjamin Kiesl-Reiter, Soonho Kong, Rupak Majumdar, and Dirk Nowotka. Solving string constraints using sat. In Constantin Enea and Akash Lal, editors, Computer Aided Verification, pages 187–208, Cham, 2023. Springer Nature Switzerland. doi:10.1007/978-3-031-37703-7_9.
[33] Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel D. Day, Florin Manea, and Vijay Ganesh. Layered and staged Monte Carlo tree search for SMT strategy synthesis. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 1907–1915. ijcai.org, 2024. URL: https://www.ijcai.org/proceedings/2024/211.
[34] R. C. Lyndon and M. P. Schützenberger. The equation $a^{M}=b^{N}c^{P}$ in a free group. Michigan Mathematical Journal, 9(4):289–298, 1962. doi:10.1307/mmj/1028998766.
[35] G S Makanin. The problem of solvability of equations in a free semigroup. Mathematics of the USSR-Sbornik, 32(2):129, February 1977. doi:10.1070/SM1977v032n02ABEH002376.
[36] Jakob Nielsen. Die isomorphismen der allgemeinen, unendlichen gruppe mit zwei erzeugenden. Mathematische Annalen, 78(1):385–397, 1917.
[37] Aina Niemetz, Mathias Preiner, Andrew Reynolds, Clark W. Barrett, and Cesare Tinelli. Syntax-guided quantifier instantiation. In Jan Friso Groote and Kim Guldstrand Larsen, editors, Tools and Algorithms for the Construction and Analysis of Systems - 27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Part II, volume 12652 of Lecture Notes in Computer Science, pages 145–163. Springer, 2021. doi:10.1007/978-3-030-72013-1_8.
[38] Andres Nötzli, Andrew Reynolds, Haniel Barbosa, Clark W. Barrett, and Cesare Tinelli. Even faster conflicts and lazier reductions for string solvers. In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part II, volume 13372 of Lecture Notes in Computer Science, pages 205–226. Springer, 2022. doi:10.1007/978-3-031-13188-2_11.
[39] Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. In 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pages 495–500. IEEE Computer Society, 1999. doi:10.1109/SFFCS.1999.814622.
[40] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. High-level abstractions for simplifying extended string constraints in SMT. In Isil Dillig and Serdar Tasiran, editors, Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part II, volume 11562 of Lecture Notes in Computer Science, pages 23–42. Springer, 2019. doi:10.1007/978-3-030-25543-5_2.
[41] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. Reductions for strings and regular expressions revisited. In 2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, September 21-24, 2020, pages 225–235. IEEE, 2020. doi:10.34727/2020/isbn.978-3-85448-042-6_30.
[42] Andrew Reynolds, Maverick Woo, Clark W. Barrett, David Brumley, Tianyi Liang, and Cesare Tinelli. Scaling up DPLL(T) string solvers using context-dependent simplification. In Rupak Majumdar and Viktor Kuncak, editors, Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, volume 10427 of Lecture Notes in Computer Science, pages 453–474. Springer, 2017. doi:10.1007/978-3-319-63390-9_24.
[43] Neha Rungta. A billion SMT queries a day (invited paper). In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part I, volume 13371 of Lecture Notes in Computer Science, pages 3–18. Springer, 2022. doi:10.1007/978-3-031-13185-1_1.
[44] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. A symbolic execution framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA, pages 513–528. IEEE Computer Society, 2010. doi:10.1109/SP.2010.38.
[45] Trauc string constraints benchmark collection, 2020. URL: https://github.com/plfm-iis/trauc_benchmarks.
[46] Margus Veanes, Nikolaj S. Bjørner, Lev Nachmanson, and Sergey Bereg. Monadic decomposition. J. ACM, 64(2):14:1–14:28, 2017. doi:10.1145/3040488.

[bib.bib1] [1] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Flatten and conquer: A framework for efficient analysis of string constraints. In Albert Cohen and Martin T. Vechev, editors, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 602–617. ACM, 2017. doi:10.1145/3062341.3062384.

[bib.bib2] [2] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. Trau: SMT solver for string constraints. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1–5. IEEE, 2018. doi:10.23919/FMCAD.2018.8602997.

[bib.bib3] [3] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Denghang Hu, Wei-Lun Tsai, Zhillin Wu, and Di-De Yen. Solving not-substring constraint with flat abstraction. In Programming Languages and Systems, pages 305–320, Cham, 2021. Springer International Publishing. doi:10.1007/978-3-030-89051-3_17.

[bib.bib4] [4] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bui Phi Diep, Lukáš Holík, and Petr Janků. Chain-free string constraints. In Yu-Fang Chen, Chih-Hong Cheng, and Javier Esparza, editors, Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, October 28-31, 2019, Proceedings, volume 11781 of Lecture Notes in Computer Science, pages 277–293. Springer, 2019. doi:10.1007/978-3-030-31784-3_16.

[bib.bib5] [5] C. Aiswarya, Soumodev Mal, and Prakash Saivasan. On the satisfiability of context-free string constraints with subword-ordering. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’22, New York, NY, USA, 2022. Association for Computing Machinery. doi:10.1145/3531130.3533329.

[bib.bib6] [6] Rajeev Alur and Pavol Černý. Streaming transducers for algorithmic verification of single-pass list-processing programs. SIGPLAN Not., 46(1):599–610, January 2011. doi:10.1145/1925844.1926454.

[bib.bib7] [7] Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. Efficient solving of string constraints for security analysis. In William L. Scherlis and David Brumley, editors, Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016, pages 4–6. ACM, 2016. doi:10.1145/2898375.2898393.

[bib.bib8] [8] Murphy Berzish, Joel D. Day, Vijay Ganesh, Mitja Kulczynski, Florin Manea, Federico Mora, and Dirk Nowotka. Towards more efficient methods for solving regular-expression heavy string constraints. Theor. Comput. Sci., 943:50–72, 2023. doi:10.1016/j.tcs.2022.12.009.

[bib.bib9] [9] Murphy Berzish, Mitja Kulczynski, Federico Mora, Florin Manea, Joel D. Day, Dirk Nowotka, and Vijay Ganesh. An SMT solver for regular expressions and linear arithmetic over string length. In Alexandra Silva and K. Rustan M. Leino, editors, Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II, volume 12760 of Lecture Notes in Computer Science, pages 289–312. Springer, 2021. doi:10.1007/978-3-030-81688-9_14.

[bib.bib10] [10] Berzish, Murphy. Z3str4: A solver for theories over strings. PhD thesis, University of Waterloo, 2021. URL: http://hdl.handle.net/10012/17102.

[bib.bib11] [11] Nikolaj S. Bjørner, Nikolai Tillmann, and Andrei Voronkov. Path feasibility analysis for string-manipulating programs. In Stefan Kowalewski and Anna Philippou, editors, Tools and Algorithms for the Construction and Analysis of Systems, 15th International Conference, TACAS 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, volume 5505 of Lecture Notes in Computer Science, pages 307–321. Springer, 2009. doi:10.1007/978-3-642-00768-2_27.

[bib.bib12] [12] František Blahoudek, Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Word equations in synergy with regular constraints. In Marsha Chechik, Joost-Pieter Katoen, and Martin Leucker, editors, Formal Methods - 25th International Symposium, FM 2023, Lübeck, Germany, March 6-10, 2023, Proceedings, volume 14000 of Lecture Notes in Computer Science, pages 403–423. Springer, 2023. doi:10.1007/978-3-031-27481-7_23.

[bib.bib13] [13] Taolue Chen, Yan Chen, Matthew Hague, Anthony W. Lin, and Zhilin Wu. What is decidable about string constraints with the replaceall function. Proc. ACM Program. Lang., 2(POPL):3:1–3:29, 2018. doi:10.1145/3158091.

[bib.bib14] [14] Taolue Chen, Alejandro Flores-Lamas, Matthew Hague, Zhilei Han, Denghang Hu, Shuanglong Kan, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Solving string constraints with regex-dependent functions through transducers with priorities and variables. Proc. ACM Program. Lang., 6(POPL):1–31, 2022. doi:10.1145/3498707.

[bib.bib15] [15] Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. A decision procedure for path feasibility of string manipulating programs with integer data type. In Dang Van Hung and Oleg Sokolsky, editors, Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings, volume 12302 of Lecture Notes in Computer Science, pages 325–342. Springer, 2020. doi:10.1007/978-3-030-59152-6_18.

[bib.bib16] [16] Taolue Chen, Matthew Hague, Anthony W. Lin, Philipp Rümmer, and Zhilin Wu. Decision procedures for path feasibility of string-manipulating programs with complex operations. Proc. ACM Program. Lang., 3(POPL):49:1–49:30, 2019. doi:10.1145/3290362.

[bib.bib17] [17] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Solving string constraints with lengths by stabilization. Proceedings of the ACM on Programming Languages, 7(OOPSLA2):2112–2141, 2023. doi:10.1145/3622872.

[bib.bib18] [18] Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, and Juraj Síč. Z3-Noodler: An automata-based string solver. In Bernd Finkbeiner and Laura Kovács, editors, Tools and Algorithms for the Construction and Analysis of Systems - 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part I, volume 14570 of Lecture Notes in Computer Science, pages 24–33. Springer, 2024. doi:10.1007/978-3-031-57246-3_2.

[bib.bib19] [19] Yu-Fang Chen, Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. A uniform framework for handling position constraints in string solving. Proc. ACM Program. Lang., 9(PLDI), 2025. doi:10.1145/3729273.

[bib.bib20] [20] Joel D. Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, and Danny Bøgsted Poulsen. On solving word equations using SAT. In RP’19, volume 11674 of LNCS, pages 93–106. Springer, 2019. doi:10.1007/978-3-030-30806-3_8.

[bib.bib21] [21] Joel D. Day, Vijay Ganesh, Nathan Grewal, and Florin Manea. On the expressive power of string constraints. Proc. ACM Program. Lang., 7(POPL), January 2023. doi:10.1145/3571203.

[bib.bib22] [22] Joel D. Day, Vijay Ganesh, Paul He, Florin Manea, and Dirk Nowotka. The satisfiability of extended word equations: The boundary between decidability and undecidability. CoRR, abs/1802.00523, 2018. arXiv:1802.00523.

[bib.bib23] [23] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. Z3: An efficient SMT solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008. doi:10.1007/978-3-540-78800-3_24.

[bib.bib24] [24] Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. Negated string containment is decidable (technical report). CoRR, abs/2506.22061, 2025. arXiv:2506.22061.

[bib.bib25] [25] Juhani Karhumäki, Filippo Mignosi, and Wojciech Plandowski. The expressibility of languages and relations by word equations. J. ACM, 47(3):483–505, 2000. doi:10.1145/337244.337255.

[bib.bib26] [26] Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. A DPLL(T) theory solver for a theory of strings and regular expressions. In Armin Biere and Roderick Bloem, editors, Computer Aided Verification - 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings, volume 8559 of Lecture Notes in Computer Science, pages 646–662. Springer, 2014. doi:10.1007/978-3-319-08867-9_43.

[bib.bib27] [27] Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark W. Barrett, and Morgan Deters. An efficient SMT solver for string constraints. Formal Methods Syst. Des., 48(3):206–234, 2016. doi:10.1007/s10703-016-0247-6.

[bib.bib28] [28] Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark W. Barrett. A decision procedure for regular membership and length constraints over unbounded strings. In Carsten Lutz and Silvio Ranise, editors, Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings, volume 9322 of Lecture Notes in Computer Science, pages 135–150. Springer, 2015. doi:10.1007/978-3-319-24246-0_9.

[bib.bib29] [29] Anthony Widjaja Lin and Pablo Barceló. String solving with word equations and transducers: Towards a logic for analysing mutation XSS. In Rastislav Bodík and Rupak Majumdar, editors, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 123–136. ACM, 2016. doi:10.1145/2837614.2837641.

[bib.bib30] [30] M. Lothaire, editor. Combinatorics on Words. Cambridge Mathematical Library. Cambridge University Press, 2 edition, 1997.

[bib.bib31] [31] M. Lothaire. Algebraic Combinatorics on Words. Cambridge University Press, 2002.

[bib.bib32] [32] Kevin Lotz, Amit Goel, Bruno Dutertre, Benjamin Kiesl-Reiter, Soonho Kong, Rupak Majumdar, and Dirk Nowotka. Solving string constraints using sat. In Constantin Enea and Akash Lal, editors, Computer Aided Verification, pages 187–208, Cham, 2023. Springer Nature Switzerland. doi:10.1007/978-3-031-37703-7_9.

[bib.bib33] [33] Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel D. Day, Florin Manea, and Vijay Ganesh. Layered and staged Monte Carlo tree search for SMT strategy synthesis. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 1907–1915. ijcai.org, 2024. URL: https://www.ijcai.org/proceedings/2024/211.

[bib.bib34] [34] R. C. Lyndon and M. P. Schützenberger. The equation $a^{M}=b^{N}c^{P}$ in a free group. Michigan Mathematical Journal, 9(4):289–298, 1962. doi:10.1307/mmj/1028998766.

[bib.bib35] [35] G S Makanin. The problem of solvability of equations in a free semigroup. Mathematics of the USSR-Sbornik, 32(2):129, February 1977. doi:10.1070/SM1977v032n02ABEH002376.

[bib.bib36] [36] Jakob Nielsen. Die isomorphismen der allgemeinen, unendlichen gruppe mit zwei erzeugenden. Mathematische Annalen, 78(1):385–397, 1917.

[bib.bib37] [37] Aina Niemetz, Mathias Preiner, Andrew Reynolds, Clark W. Barrett, and Cesare Tinelli. Syntax-guided quantifier instantiation. In Jan Friso Groote and Kim Guldstrand Larsen, editors, Tools and Algorithms for the Construction and Analysis of Systems - 27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Part II, volume 12652 of Lecture Notes in Computer Science, pages 145–163. Springer, 2021. doi:10.1007/978-3-030-72013-1_8.

[bib.bib38] [38] Andres Nötzli, Andrew Reynolds, Haniel Barbosa, Clark W. Barrett, and Cesare Tinelli. Even faster conflicts and lazier reductions for string solvers. In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part II, volume 13372 of Lecture Notes in Computer Science, pages 205–226. Springer, 2022. doi:10.1007/978-3-031-13188-2_11.

[bib.bib39] [39] Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. In 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pages 495–500. IEEE Computer Society, 1999. doi:10.1109/SFFCS.1999.814622.

[bib.bib40] [40] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. High-level abstractions for simplifying extended string constraints in SMT. In Isil Dillig and Serdar Tasiran, editors, Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part II, volume 11562 of Lecture Notes in Computer Science, pages 23–42. Springer, 2019. doi:10.1007/978-3-030-25543-5_2.

[bib.bib41] [41] Andrew Reynolds, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. Reductions for strings and regular expressions revisited. In 2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, September 21-24, 2020, pages 225–235. IEEE, 2020. doi:10.34727/2020/isbn.978-3-85448-042-6_30.

[bib.bib42] [42] Andrew Reynolds, Maverick Woo, Clark W. Barrett, David Brumley, Tianyi Liang, and Cesare Tinelli. Scaling up DPLL(T) string solvers using context-dependent simplification. In Rupak Majumdar and Viktor Kuncak, editors, Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II, volume 10427 of Lecture Notes in Computer Science, pages 453–474. Springer, 2017. doi:10.1007/978-3-319-63390-9_24.

[bib.bib43] [43] Neha Rungta. A billion SMT queries a day (invited paper). In Sharon Shoham and Yakir Vizel, editors, Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part I, volume 13371 of Lecture Notes in Computer Science, pages 3–18. Springer, 2022. doi:10.1007/978-3-031-13185-1_1.

[bib.bib44] [44] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. A symbolic execution framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&P 2010, 16-19 May 2010, Berleley/Oakland, California, USA, pages 513–528. IEEE Computer Society, 2010. doi:10.1109/SP.2010.38.

[bib.bib45] [45] Trauc string constraints benchmark collection, 2020. URL: https://github.com/plfm-iis/trauc_benchmarks.

[bib.bib46] [46] Margus Veanes, Nikolaj S. Bjørner, Lev Nachmanson, and Sergey Bereg. Monadic decomposition. J. ACM, 64(2):14:1–14:28, 2017. doi:10.1145/3040488.

	$\displaystyle r(4\|\beta\|+4\|\alpha\|)$	$\displaystyle=\|o_{1}\|+\|s\|+\|\sigma(\mathcal{H}_{i})\|+\|p\|+\|o_{2}\|$	$\displaystyle{\color[rgb]{0.5,0.5,0.5}\definecolor[named]{pgfstrokecolor}{rgb}% {0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@color@gray@fill{0.5}\lbag% \text{def.\ of $\gamma$}\rbag}$
$\displaystyle\Rightarrow\hskip-28.45274pt$	$\displaystyle 7r\|\alpha\|+r\|\alpha\|$	$\displaystyle\leq\|o_{1}\|+r\|\alpha\|+\|o_{2}\|$	$\displaystyle{\color[rgb]{0.5,0.5,0.5}\definecolor[named]{pgfstrokecolor}{rgb}% {0.5,0.5,0.5}\pgfsys@color@gray@stroke{0.5}\pgfsys@color@gray@fill{0.5}\lbag% \text{since $\|\alpha\|=\|\beta\|$ and def.\ of $r$}\rbag}$	(3)
$\displaystyle\Leftrightarrow\hskip-28.45274pt$	$\displaystyle 7r\|\alpha\|$	$\displaystyle\leq\|o_{1}\|+\|o_{2}\|$

Negated String Containment Is Decidable

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Acknowledgements:

Funding:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

2 Preliminaries

Numbers.

Words.

Languages.

Fact 1.

Automata.

The ¬𝐂𝐨𝐧𝐭𝐚𝐢𝐧𝐬 constraint.

Theorem 2 ([19, Theorem 7.5]).

2.1 Normalization

Lemma 3.

Lemma 4 ([19]).

Proof sketch.

2.2 Lemmas in Our Toolbox

Lemma 5.

Proof.

Lemma 6 ([31, Proposition 1.2.1 (Fine and Wilf)]).

Corollary 7.

Lemma 8 ([31, Proposition 12.1.3]).

Lemma 9 ([34]).

Corollary 10.

Proof.

2.3 Easy Fragments

3 Overview

4 Removing Two-Sided Non-Flat Variables

Theorem 11.

Definition 12 (ℓ-aligned word).

4.1 Proof of Theorem 11

Lemma 13.

Lemma 14.

Proof.

5 𝚪-Expansion and Prefix/Suffix Trees

5.1 𝚪-Expansions on Non-Flat Variables

Definition 15 (Γ-expansion).

Lemma 16.

Proof sketch..

Lemma 17.

Proof sketch..

5.2 Prefix (Suffix) Enumeration through Prefix (Suffix) Trees

Definition 18 (Choice state).

Definition 19 (Prefix tree).

Definition 20 (Dead-end vertex of a prefix tree).

Definition 21 (H-reaching path).

6 Underapproximating Non-Flat Variables

6.1 Overcoming the Infinite by Equivalence with a Finite Index

6.2 Inspecting the Structure of Non-flat Variables in the Presence of Long Flat Variables

Theorem 22.

Lemma 23.

Proof sketch..

Lemma 24.

Lemma 25.

Proof sketch.

7 Decision Procedure

Theorem 26 (Soundness).

Proof.

Theorem 27 (Completeness).

Proof.

Theorem 28.

Proof sketch..

7.1 Chain-Free Word Equations with ¬𝐂𝐨𝐧𝐭𝐚𝐢𝐧𝐬

Theorem 29.

8 Future Work

References

The $\neg\mathrm{Contains}$ constraint.

Definition 12 ( $\ell$ -aligned word).

5 $\Gamma$ -Expansion and Prefix/Suffix Trees

5.1 $\Gamma$ -Expansions on Non-Flat Variables

Definition 15 ( $\Gamma$ -expansion).

Definition 21 ( $H$ -reaching path).

7.1 Chain-Free Word Equations with $\neg\mathrm{Contains}$