Abstract 1 Introduction 2 Preliminaries 3 Results for Pattern Languages with Length Constraints 4 Result for Pattern Languages with Regular and Length Constraints 5 Further Discussion References Appendix A Definition of Nondeterministic 2-Counter Automata without Input Appendix B Definition of the Universal Turing Machine 𝑼 Appendix C Extension to Larger Alphabets in Theorem 8 Appendix D Extension to Larger Alphabets in Theorem 12 Appendix E NP-hardness of the Unary Case for the Equivalence Problem for Non-Erasing Pattern Languages with Length Constraints

The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable

Dirk Nowotka ORCID Department of Computer Science, Kiel University, Germany Max Wiedenhöft Department of Computer Science, Kiel University, Germany
Abstract

Patterns are words with terminals and variables. The language of a pattern is the set of words obtained by uniformly substituting all variables with words that contain only terminals. Length constraints restrict valid substitutions of variables by associating the variables of a pattern with a system (or disjunction of systems) of linear diophantine inequalities. Pattern languages with length constraints contain only words in which all variables are substituted to words with lengths that fulfill such a given set of length constraints. We consider membership, inclusion, and equivalence problems for erasing and non-erasing pattern languages with length constraints.

Our main result shows that the erasing equivalence problem - one of the most prominent open problems in the realm of patterns - becomes undecidable if length constraints are allowed in addition to variable equality.

Additionally, it is shown that the terminal-free inclusion problem, a prominent problem which has been shown to be undecidable in the binary case for patterns without any constraints, is also generally undecidable for all larger alphabets in this setting.

Finally, we also show that considering regular constraints, i.e., associating variables also with regular languages as additional restrictions together with length constraints for valid substitutions, results in undecidability of the non-erasing equivalence problem. This sets a first upper bound on constraints to obtain undecidability in this case, as this problem is trivially decidable in the case of no constraints and as it has unknown decidability if only regular or only length constraints are considered.

Keywords and phrases:
Patterns, Pattern Languages, Length Constraints, Regular Constraints, Decidability, Undecidability, Membership, Inclusion, Equivalence
Funding:
Max Wiedenhöft: This work was supported by the DFG project number 437493335.
Copyright and License:
[Uncaptioned image] © Dirk Nowotka and Max Wiedenhöft; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory
Related Version:
Full Version: https://arxiv.org/abs/2411.06904 [27]
Editors:
Paola Bonizzoni and Veli Mäkinen

1 Introduction

A pattern is a finite word consisting only of symbols from a finite set of letters Σ={𝚊1,,𝚊σ}, also called terminals, and from an infinite set of variables X={x1,x2,} such that we have ΣX=. It is a natural and compact device to define formal languages. From patterns, we obtain words consisting only of terminals using a substitution h, a terminal preserving morphism that maps all variables in a pattern to words over the terminal alphabet. The language of a pattern is the set of all words obtainable using arbitrary substitutions.

We differentiate between two kinds of substitutions. In the original definition of patterns and pattern languages introduced by Angluin [1], only words obtained by non-erasing substitutions are considered. Here, all variables are required to be mapped to non-empty words. The resulting languages are called non-erasing (NE) pattern languages. Later, so called erasing-/extended- or just E-pattern languages have been introduced by Shinohara [35]. Here, variables may also be substituted by the empty word ε. Consider, for example, the pattern α:=x1𝚊x2𝚋x1. Then, if we map x1 to 𝚊𝚋 and x2 to 𝚋𝚋𝚊𝚊 using a substitution h, we obtain the word h(α)=𝚊𝚋𝚊𝚋𝚋𝚊𝚊𝚋𝚊𝚋. Considering the E-pattern language of α, we could also map x1 to the empty word and obtain any word in the language {𝚊}Σ{𝚋}.

Due to its practical and simple definition, patterns and their corresponding languages occur in numerous areas in computer science and discrete mathematics. These include, for example, unavoidable patterns [18, 23], algorithmic learning theory [1, 6, 36], word equations [23], theory of extended regular expressions with back references [12], or database theory [10, 34].

There are three main decision problems regarding patterns and pattern languages. Those are the membership problem (and its variations [14, 15, 7]), the inclusion problem, and the equivalence problem. All are considered in the erasing (E) and non-erasing (NE) cases. The membership problem determines if a word belongs to the language of a pattern. This problem has been shown to be NP-complete for both, erasing- and non-erasing, pattern languages [1, 18]. The inclusion problem determines whether the language of one pattern is a subset of the language of another pattern. It has been shown to be generally undecidable by Jiang et al. in [19]. Freydenberger and Reidenbach [11] as well as Bremer and Freydenberger [3] improved that result and showed that it is undecidable for all bounded alphabets of size |Σ|2 for both erasing and non-erasing pattern languages. The equivalence problem asks whether the languages of two patterns are equal. For NE-pattern languages, this problem is trivially decidable and characterized by the equality of patterns up to a renaming of their variables [1]. The decidability of the erasing case is one of the major open problems in the field [19, 30, 29, 28, 31]. For terminal-free patterns, however, i.e., patterns without any terminal letters, the inclusion problem as well as the equivalence problem for E-pattern languages have been characterized and shown to be NP-complete [19, 5]. In addition to that, in the case of terminal-free NE-pattern languages, Saarela [32] has shown that the inclusion problem is undecidable in the case of a binary underlying alphabet.

Over time, various extensions to patterns and pattern languages have been introduced, either, to obtain additional expressibility due to some practical context or to get closer to an answer for the remaining open problems. Some examples are the bounded scope coincidence degree, patterns with bounded treewidth, k-local patterns, or strongly-nested patterns (see [4] and references therein). Koshiba [21] introduced so called typed patterns that restrict substitutions of variables to types, i.e., arbitrary recursive languages. Geilke and Zilles [16] extended this recently to the notion of relational patterns and relational pattern languages. In a similar more recent context and with specific relational constraints such as equal length, besides string equality, Freydenberger discusses in [8], building on [2, 13, 9], so called conjunctive regular path queries (CRPQs) with string equality and with length equality. They can be understood as systems of (relational) patterns with regular constraints (restricting variable substitutions to words of given regular languages) and with equal length constraints between variables.

In [26], a special form of typed patterns has been considered, i.e., patterns with regular constraints (also comparable to singleton sets of H-Systems [13]). Here, like mentioned before, variables may be restricted to arbitrary regular languages and the same variable may occur more than once. It has been shown that this notion suffices to obtain undecidability for both main open problems regarding pattern languages, i.e., the equivalence problem of E-pattern languages and the inclusion problem of terminal-free NE-pattern languages with an alphabet greater or equal to 3. Another natural extension other than regular constraints is the notion of length constraints. Here, instead of restricting the choice of words for the substitution of variables, length constraints just restrict the lengths of substitution of variables in relation to each other. In the field of word equations, length constraints have been considered as a natural extension for a long time and, e.g., answering the decidability of the question whether word equations with length constraints have a solution, is a long outstanding problem (see, e.g., [22] and the references therein).

In this paper, we consider that natural extension of length constraints on patterns, resulting in the class of patterns called patterns with length constraints. In general, we say that a length constraint is a disjunction of systems of linear (diophantine) inequalities over the variables of X. We denote the set of all length constraints by 𝒞Len. A pattern with length constraints (α,α)(ΣX)×𝒞Len is a pattern associated with a length constraint. We say that a substitution h is α-valid if all variables are substituted according to α. Now, the language of (α,α) is defined analogously to pattern languages but restricted to α-valid substitutions in the erasing- and non-erasing cases.

We examine erasing (E) and non-erasing (NE) pattern languages with length constraints. It can be shown, following existing results for patterns without additional constraints, that the membership problem for both cases in NP-complete. The inclusion problem is shown to be undecidable in both cases, too, notably even for terminal-free pattern languages, which is a difference to the decidability of the inclusion problem in the erasing case for pattern languages without any constraints, and which answers the case of alphabet sizes greater or equal to 3 for non-erasing patterns without constraints. The main result of this paper is the undecidability of the equivalence problem for erasing pattern languages with length constraints in both cases, terminal-free and general, giving an answer to a problem of which the decidability has been an open problem for a long time in the case of no constraints. The final result shows that regular constraints and length constraints combined suffice to show undecidability of the equivalence problem for non-erasing pattern languages, a problem that is trivially decidable in case of no constraints and still open in the cases of just regular constraints or just length constraints.

We begin by introducing the necessary notation in Section 2 and then continue in Section 3 with an examination of patterns with length constraints and their corresponding languages. Here, we will briefly discuss all results regarding the related decision problems (membership, inclusion, equivalence). Due to the extensiveness of some proofs, see [27] for the full formal versions of the shortened ones. In Section 4, we continue with an examination of patterns with regular and length constraints, also giving a full picture of the related decision problems. Finally, in Section 5, we close this paper with a summary and discussion of the obtained results, the methods that were used, and the open problems that remain.

2 Preliminaries

Let denote the natural numbers. For n,m set [m,n]:={kmkn}. Denote [n]:=[1,n] and [n]0:=[0,n]. The powerset of any set A is denoted by 𝒫(A). An alphabet Σ is a non-empty finite set whose elements are called letters or terminals. A word is a finite sequence of letters from Σ. Let Σ be the set of all finite words over Σ, thus it is a free monoid with concatenation as operation and the empty word ε as the neutral element. Set Σ+:=Σ{ε}. We call the number of letters in a word wΣ length of w, denoted by |w|. Therefore, we have |ε|=0. Let Σk denote the set of all words of length k (resp. Σk or Σk). If w=xyz for some x,y,zΣ, we call x a prefix of w, y a factor of w, and z a suffix of w and denote the sets of all prefixes, factors, and suffixes of w by Pref(w), Fact(w), and Suff(w) respectively. For words w,uΣ, let |w|u denote the number of distinct occurrences of u in w as a factor. For wΣ, let w[i] denote w’s ith letter for all i[|w|]. For reasons of compactness, we denote w[i]w[j] by w[ij] for all i,j[|w|] with i<j. Set alph(w):={𝚊Σi[|w|]:w[i]=𝚊} as w’s alphabet.

Now, we introduce the notion of patterns and pattern languages with and without additional constraints (i.e., length constraints, length constraints, and a combination of both). After that, we briefly introduce the machine models used in the proofs of the main results of this paper.

2.1 Patterns and Pattern Languages with Constraints

Let X be a countable set of variables and Σ be an alphabet such that ΣX=. A pattern is then a non-empty, finite word over ΣX. A pattern is called terminal-free if it just consists of variables and is, thus, a non-empty finite word over X. The set of all patterns over ΣX is denoted by PatΣ. For example, x1𝚊x2𝚋𝚊x2x3 is a pattern over Σ={𝚊,𝚋} with x1,x2,x3X. For a pattern αPatΣ, let var(α):={xX||α|x1} denote the set of variables occurring in p. A substitution of α is a morphism h:(ΣX)Σ such that h(𝚊)=𝚊 for all 𝚊Σ and h(x)Σ for all xX. If we have h(x)ε for all xvar(α), we call h a non-erasing substitution for α. Otherwise, h is an erasing substitution for α. The set of all substitutions w.r.t. Σ is denoted by HΣ. If Σ is clear from the context, we may write just H. Given a pattern αPatΣ, its erasing pattern language LE(α) and its non-erasing pattern language LNE(α) are defined respectively by

LE(α) :={h(α)|hH,h(x)Σ for all xvar(α)}, and
LNE(α) :={h(α)|hH,h(x)Σ+ for all xvar(α)}.

A length constraint is a disjunction of systems of linear diophantine inequalities over variables of X. We denote the set of all length constraints by 𝒞Len. A pattern with length constraints is a pair (α,α)𝙿𝚊𝚝Σ×𝒞Len where all variables occurring in α must occur in α. We denote the set of all patterns with length constraints by 𝙿𝚊𝚝Σ,𝒞Len. For some (α,α)𝙿𝚊𝚝Σ,𝒞Len and hH, we say that h is a α-valid substitution if α is satisfied when associating each variable xvar(α) in α with the value |h(x)|, i.e., the length of the substitution of the variable x. Consider the following example.

Example 1.

Let Σ={𝚊,𝚋} and α=x1𝚊x2𝚊x1. Assume we have the length constraint α defined by the following system of linear diophantine inequalities:

2x1+x25
x21

Then, we know that any α-valid substitution hH cannot have h(x1)=u for some uΣ with |u|3, as this would already imply 2|u|=2|h(x)|23=6 and 65. Also, we see that h(x2)ε as the second constraint demands a substitution of length at least 1. So, for example we could have h(α)=𝚋𝚋𝚊𝚋𝚊𝚋𝚋 or h(α)=𝚋𝚊𝚋𝚊𝚋𝚊𝚋 but not h(α)=𝚊𝚊 or h(α)=𝚋𝚋𝚊𝚊𝚊𝚊𝚋𝚋.

We denote the set of all α-valid substitutions by Hα. The notion of pattern languages is extended by the following. For any (α,α)𝙿𝚊𝚝Σ,𝒞Len we denote by

LE(α,α):={h(α)|hHα,h(x)Σ for all xvar(α)}

the erasing pattern language with length constraints of (α,α) and by

LNE(α,α):={h(α)|hHα,h(x)Σ+ for all xvar(p)}

the non-erasing pattern language with length constraints of (α,α).

Similar to length constraints, we can define regular constraints for variables in a pattern. Let Reg be the set of all regular languages. We call a mapping r:XReg a regular constraint on X. If not stated otherwise, we always have r(x)=Σ. We denote the set of all regular constraints by 𝒞Reg. For some r𝒞Reg we define the language of a variable xX by Lr(x)=r(x). If r is clear by the context, we omit it and just write L(x). A pattern with regular constraints is a pair (α,rα)PatΣ×𝒞Reg. We denote the set of all patterns with regular constraints by 𝙿𝚊𝚝Σ,𝒞Reg. For some (α,rα)𝙿𝚊𝚝Σ,𝒞Reg and hH, we say that h is a rα-valid substitution if h(x)L(x) for all xvar(α). The set of all rα-valid substitutions is denoted by Hrp. Given some (α,rα)𝙿𝚊𝚝Σ,𝒞Reg, we analogously define the erasing- and non-erasing pattern languages with regular constraints LE(α,rα) and LNE(α,rα) over Hrα as we did for length constraints.

Combining both, we say that a triple (α,rα,α)𝙿𝚊𝚝×𝒞Reg×𝒞Len is a pattern with regular and length constraints and denote the set of all patterns with regular and length constraints by 𝙿𝚊𝚝Σ,𝒞Reg,Len. Given some (α,rα,α)𝙿𝚊𝚝Σ,𝒞Reg,Len, we say that a substitution hH is rα-α-valid if it is rα-valid and α-valid and denote the set of all rα-α-valid substitutions by Hrα,α. Additionally, we analogously define the erasing- and non-erasing pattern languages with regular and length constraints LE(α,rα,α) and LNE(α,rα,α) over Hrα,α as we did in the previous two cases for length constraints and regular constraints.

In the proofs of our main results Theorem 8, Theorem 12, and Theorem 19, we use two different automata models with undecidable emptiness problems to obtain each result. We will use the notion of nondeterministic 2-counter automata without input (see, e.g., [17]), as well as, the notion of a very specific universal Turing machine U as it is used in [3]. As mentioned in the introduction, due to their extensive lengths, the proofs of each of the main theorems is only given as an extended sketch in this paper together with the appendix111See [27] for the full formal proofs.. In the main body, only a rough explanation, not relying on a formal definition of the machine types, of each proof idea is provided. Hence, the formal definition of the notion of nondeterministic 2-counter automata without input and the referenced universal Turing machine U are found in Appendix A and in Appendix B, respectively. We just mention that 𝚅𝚊𝚕𝙲(A) refers to the set of encodings of valid computations of some nondeterministic 2-counter machine without input A and that 𝚅𝚊𝚕𝙲U(I) refers to the set of encodings of valid computations from some initial configuration I over the universal Turing machine U.

We continue with our overview of the results regarding pattern languages with length constraints.

3 Results for Pattern Languages with Length Constraints

To begin developing an understanding of the additional expressiveness gained by allowing for length constraints, notice the following observation for patterns with length or with regular constraints which does not necessarily hold for patterns without any constraints.

Lemma 2.

For each pattern with length constraints (α,α)𝙿𝚊𝚝Σ,𝒞Len (and for each pattern with regular constraints (β,rβ)𝙿𝚊𝚝Σ,𝒞Reg), there exists some adapted set of length constraints α𝒞Len (resp., some adapted set of regular constraints rβ𝒞Reg) such that LNE(α,α)=LE(α,α) (and LNE(β,rβ)=LE(β,rβ) resp.).

Proof.

Indeed, given some pattern with length constraints (α,α)𝙿𝚊𝚝Σ,𝒞Len, we can define the length constraint α by using all constraints in α and additionally, for each xvar(α), adding the constraint x1 to α. Then, LE(α,α)=LNE(α,α).

We obtain the same result for pattern languages with regular constraints by intersecting the language of all variables with Σ+, i.e., given any language L(x) for some variable xX that is defined by some regular constraint r𝒞Reg, we can define a regular constraint r that defines the language L(x)=L(x)Σ+. Then, given some pattern α𝙿𝚊𝚝, we obtain LNE(α,r)=LE(α,r).

The following statement then immediately follows by the previous lemma.

Corollary 3.

Solving any problem for erasing pattern languages with length (or regular) constraints is at least as hard as solving the same problem for non-erasing pattern languages with length constraints.

As shown in [26], considering regular constraints, problem cases over all patterns can be reduced to problems involving terminal-free patterns. The same does not hold in general in the case of length constraints, witnessed by the following proposition.

Proposition 4.

There exists (α,α)𝙿𝚊𝚝Σ,𝒞Len such that no (β,β)𝙿𝚊𝚝Σ,𝒞Len with a terminal-free pattern βX exists, for which we have that LX(α,α)=LX(β,β) for X{E,NE}.

Proof.

Assume |Σ|2 and assume w.l.o.g. 𝚊,𝚋Σ with 𝚊𝚋. Let (α,α)𝙿𝚊𝚝Σ,𝒞Len such that α=x1𝚊 and α is an empty length constraint. So, LE(α,α)=LE(α)=Σ{𝚊} (LNE(α,α)=LNE(α)=Σ+{𝚊}). Suppose there exists some βX and length constraint β such that LE(β,β)=LE(α,α) (LNE(β,β)=LNE(α,α)). W.l.o.g. we continue with the erasing case. The following also holds for the non-erasing case with small changes. Let wLE(α,α). Then w=u𝚊 for some uΣ. As LE(α,α)=LE(β,β), there exists some hHβ such that h(β)=w=u𝚊. Then, β=β1xβ2 for xX and β1,β2X as well as u=u1u2 for some u1,u2Σ such that h(β1)=u1, h(x)=u2𝚊 and h(β2)=ε. But then, we also find some hHβ with h(x)=u1𝚋 as 𝚊 and 𝚋 are obtained at the same position by the same variable. Then, h(β)=v𝚋 for some vΣ. As v𝚋Σ{𝚊} but LE(α,α)=Σ{𝚊}, we know that β cannot exist.

With that, we know that we cannot restrict ourselves to only show hardness or undecidability for the general case, as the terminal-free case may result in something else. We begin with the membership problems for both, the erasing- and non-erasing pattern languages. Those have been shown to be NP-complete for patterns without constraints in the terminal-free and general cases (see, e.g.,[1, 18, 33]). Hence, we observe the following for patterns with length constraints.

Corollary 5.

Let (α,α)𝙿𝚊𝚝Σ,𝒞Len and wΣ. The decision problem of whether wLX(α,α) for X{E,NE} is NP-complete, even if the considered pattern α is terminal-free.

Indeed, NP-hardness is immediately obtained by the NP-hardness of patterns without any constraints. Given some (α,α)𝙿𝚊𝚝Σ,𝒞Len, NP containment follows by the fact that any valid certificate results in a substitution hHα of length at most |h(α)|=|w| and that we can check in polynomial time whether some given substitution is α-valid. This can be done by checking whether the lengths of the variable substitutions |h(x)| for all xvar(α) satisfy α. This concludes the membership problem for all cases.

Another result that can be immediately obtained is the general undecidability of the inclusion problem for pattern languages with length constraints. For pattern languages without any constraints, this problem has been generally shown to be undecidable, first, for unbounded alphabets by Jiang et al. [19] and later on for all bounded alphabets of sizes |Σ|2 by Freydenberger and Reidenbach in [11] as well as Bremer and Freydenberger in [3]. So, we have the following.

Theorem 6 ([19, 11, 3]).

Let α,βPatΣ. In general, for all alphabets Σ with |Σ|2, it is undecidable to answer whether LX(α)LX(β) for X{E,NE}.

Out of that, in the general case, we immediately obtain the following for pattern languages with length constraints.

Corollary 7.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len. For all alphabets Σ with |Σ|2 it is undecidable to answer whether LX(α,α)LX(β,β) for X{E,NE}.

That leaves us with the terminal-free cases of the inclusion problem as well as the general and terminal-free cases of the equivalence problems for both, erasing and non-erasing pattern languages with length constraints. The first and most significant main result shows that the most prominent open problem for pattern languages, i.e., the equivalence problem for erasing patterns, is undecidable for pattern languages with length constraints, even if we are restricted to terminal-free patterns and use no disjunctions in the length constraints.

Theorem 8.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len. The problem of deciding whether we have LE(α,α)=LE(β,β) is undecidable for all alphabets Σ with |Σ|2, even if α and β are restricted to be terminal-free and even if α and β have no disjunctions.

Proof.

Based on the proofs in, e.g., [19] and [11], we reduce the emptiness problem for nondeterministic 2-counter automata without input to the equivalence problem of erasing pattern languages with length constraints. Due to the length of the formal proof, here, we only give a sketch of the idea and provide the formal construction of the reduction. For a full extension of the proof containing a collection of shown properties and a formal proof of the correctness of the reduction itself, see [27].

Given an automaton A, we construct two patterns with length constraints (α,α) and (β,β) such that LE(α,α)=LE(β,β) if and only if 𝚅𝚊𝚕𝙲(A)=. The pattern with length constraints (α,α) can be seen as a pattern which allows for all possible words of a specific format to be obtained, especially also encodings of valid computations of A. The pattern with length constraints (β,β) can be seen as a tester pattern that allows only for words that are not encodings of valid computations of A that have this specific structure. By that, if A has some valid computation and, hence, an encoding of a valid computation of A exists, we have that there exists a word in LE(α,α) that is not in LE(β,β), hence LE(α,α)LE(β,β).

The previous property is the main argument used in the proofs in [19, 11] to show undecidability of the inclusion problem for erasing pattern languages. In addition to that, using length constraints, (α,α) and (β,β) can be constructed in such a way, that we always have LE(β,β)LE(α,α). This is the most significant difference to the aforementioned results. Thus, in this case, we have that the equivalence of LE(α,α) and LE(β,β) decides the emptiness problem for A. The latter is undecidable, resulting in the undecidability of the equivalence problem of erasing pattern languages with length constraints. This is similar to the idea of the construction in [26] for patterns with regular constraints, where the same approach but with differently constructed patterns and with another type of constraints has been used. We also mention that length constraints allow for a construction that shows undecidability even in the case of terminal-free patterns. This is interesting, as for terminal-free erasing pattern languages without any constraints, equivalence and inclusion are actually decidable.

Now, we continue with the constructions of (α,α) and (β,β). First, (α,α) is defined by

α=y1xvα1xvα2xvy2xvzzzzxv

for independent variables y1,y2,xv,α1,α2,z,zX, and by the following linear system α:

  • xv=5

  • z=1, z=1

Notice that any word in LE(α,α) has a suffix of fixed length with the form vuv where vΣ5 and u{0000,####,0##0,#00#}. In comparison to the construction in [11] that made use of a specific prefix and a specific suffix to enforce predicate choice, only this suffix suffices for this purpose in this construction.

Next, we construct (β,β). We set the pattern β to

β=y1β^1β^2β^μ+1y2β¨1β¨2β¨μ+1

for new and independent variables y1,y2X and terminal-free patterns β^i,β¨iX. Assume μ to be defined as in [11]. For i[μ+1], set

  • β^i=xiγixiδixi,

  • β¨i=xiηixi,

for new and independent variables xiX and terminal-free patterns γi,δi,ηiX. Assume

  • ηi=zizizizi and zizi for i[μ],

  • ημ+1=(zμ+1)4,

  • var(γiδiηi)var(γjδjηj)= if ij for i,j[μ+1], and

  • xk,y1,y2var(γiδiηi) for all i,k[μ+1]

for new and independent variables zi,ziX. Hence, all variables inside var(γiδiηi) do not occur anywhere else but in these factors. The length constraint β is defined by the following and only the following system (no additional constraints will be added later on).

  1. 1.

    i=1μ+1zi=1

  2. 2.

    i=1μzi1

  3. 3.

    i=1μ+1xi=5

  4. 4.

    xi5zi=0 for all i[μ+1]

  5. 5.

    zizi=0 for all i[μ]

For all i[μ], we say that γi and δi are defined just as in [11]. For the case of μ+1, we set γμ+1 and δμ+1 by

  • γμ+1=yμ+1, and

  • δμ+1=yμ+1

for new and independent variables yμ+1 and yμ+1.

We now give a really rough sketch of the idea on why this construction works. First, the length constraints β allow for only one β¨i, for i[μ+1], to be substituted to a non-empty word. Any word obtained by β¨i can be obtained by the suffix xvzzzzxv of (α,α). Additionally, everything obtained in β^i can then be obtained by xvα1xvα2xv in (α,α). Everything else can be obtained by the variables y1 and y2. Hence, we get the most significant property of this proof, i.e., LE(β,β)LE(α,α).

Now, for the other direction, we consider two cases. These are, given some substitution hHα, whether h(z)=h(z) or whether h(z)h(z). By α, we know |h(z)|=|h(z)|=1. It can be shown that, if h(z)=h(z), then we can use β¨μ+1 to obtain h(xvzzzzxv) and β^μ+1 to obtain h(xvα1xvα2xv). h(y1) can be obtained in y1 and the same holds analogously for h(y2) and y2. Hence, if h(z)=h(z), then h(α)LE(β,β). If on the other hand we have h(z)h(z) and h(y1)=ε as well as h(y2)=ε, we need to find some β^i and β¨i, for i[μ], to obtain h(xvα1xvα2xv) and h(xvzzzzxv). By the construction in [11], we know that this only works if h(α1) is not the encoding of a valid computation of A or if h(α2) either contains h(z) or |h(α2)| is a unary word over h(z) that is short enough (If h(y1)ε or h(y2)ε, we can either use y1 or y2 in β to obtain parts of h(α), or we also have to rely on some β¨i and β^i and the same property as before holds). Hence, if and only if there exists some valid computation for A, we can find a substitution hHα for which h(α)LE(β,β). This concludes the sketch for the binary case of the proof of Theorem 8. See Appendix C for a sketch on how to extend this to arbitrary fixed alphabets. As mentioned before, see [27] for a full formal proof of this result.

Due to the construction in the proof of Theorem 8, the undecidability of the inclusion problem for terminal-free erasing pattern languages immediately follows.

Corollary 9.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len for terminal-free patterns α,βX. The problem of answering whether we have LE(α,α)LE(β,β) is undecidable for all alphabets Σ with |Σ|2.

Proof.

Suppose it was decidable. Then, we could decide whether LE(α,α)LE(β,β) and whether LE(β,β)LE(α,α) and by that decide if we have LE(α,α)=LE(β,β). That is a contradiction to Theorem 8.

The second main result of the paper, and the last one regarding patterns with only length constraints, shows that the inclusion problem for terminal-free non-erasing pattern languages with length constraints is also undecidable for all alphabets Σ with |Σ|>2. In [32], Saarela has shown this problem to be undecidable for patterns without any constraints in the binary case.

Theorem 10 ([32]).

Let α,βX be two terminal-free patterns. Answering whether LNE(α)LNE(β) is undecidable for alphabets Σ with |Σ|=2.

Hence, the undecidability of the inclusion problem for pattern languages with length constraints follows immediately in this case.

Corollary 11.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len for terminal-free patterns α,βX. Answering whether we have LNE(α,α)LNE(β,β) is undecidable for alphabets Σ with |Σ|=2.

In the case of length constraints, a general extension to all alphabet sizes is possible. We obtain the following.

Theorem 12.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len for terminal-free patterns α,βX. Answering whether we have LNE(α,α)LNE(β,β) is undecidable for alphabets Σ with |Σ|2, even if α and β have no disjunctions.

Proof.

Again, we only give a sketch of the idea and the formal construction used in this proof. The binary case follows by Corollary 11, i.e., the result by Saarela [32]. The general proof idea is based on the construction done by Freydenberger and Bremer [3] for the binary case and significantly adapted in its extension to arbitrary alphabets. As the extension to larger alphabets is strongly based on the construction for the binary case, we give the construction for the binary case here and sketch out the extension to arbitrary alphabets in Appendix D. As for Theorem 8, due to its extensive length, the full formal proof can be found in [27].

The idea is relatively similar to the idea for the proof of Theorem 8 (or the proof in [3]) in that is utilizes a pattern (α,α) in which all words with a specific structure can be obtained, in particular, special encodings of valid computations of the specific universal Turing machine U over any input I, as defined in Appendix B, and a pattern (β,β) from which we can obtain all words that are in LNE(α,α) but these encodings of valid computations. As this result only shows undecidability of the inclusion problem, we do not have the property LNE(β,β)LNE(α,α) and only use the case distinction whether LNE(α,α)LNE(β,β) to decide the emptiness of 𝚅𝚊𝚕𝙲U(I).

We continue with the formal construction used to reduce the emptiness problem of the universal Turing machine U to the inclusion problem of terminal-free non-erasing pattern languages with length constraints in the binary case. After hat, we give a sketch of the idea on why this construction works. For the binary case, we essentially take slight adaptations of the constructed patterns from [3] but swap out any occurrence of the letter 0 with a new variable x0 and any occurrence of the letter # with a new variable x# (also similar, to some extend, to the idea in [32]). In that sense, the patterns look very similar to the constructed patterns in [3], but do not contain any terminals. With length constraints, we can restrict the obtainable words in such a way that the inclusion property LNE(α,α)LNE(β,β) decides the emptiness of 𝚅𝚊𝚕𝙲U(I).

First, we construct (β,β), as the construction of (α,α) is based on it. We define β by

β=x0xaxbx#5xax1xμ+1xbx#5r1β^1r2β^2rμ+1β^μ+1rμ+2

for new and independent variables x0,x#,xa,xb,x1,x2,,xμ+1,r1,r2,,rμ+2X and terminal-free patterns β^1,β2^,,β^μ+1X+. Notice that μ is a number given by the construction in [3]. For all i[μ+1] we say that

β^i=x0xi4x0γix0xi4x0δix0xi4x0

for terminal-free patterns γi,δiX+ that are defined later. The length constraints β are defined by the system

  • x0=1, x#=1, xa+xb=μ+2, and

  • xi=1 for all i[μ+1]

Let τ:(Σ×X)X be a homomorphism defined by τ(0)=x0, τ(#)=x#, and τ(x)=x for all xX. Then we say that for all i[μ] we have γi=τ(γi) and δi=τ(δi) for γi and δi being the patterns used in the construction in [3]. The specific construction of each such pattern is not relevant here and for details we refer to [3]. What is important is that we use the same patterns as in [3] but map them to terminal-free patterns that have the variable x0 instead of each occurrence of a terminal symbol 0 and the variable x# instead of each occurrence of #.

For the missing case of μ+1, we define

γμ+1 =x0|I|+4yμ+1
δμ+1 =yμ+1

for new and independent variables yμ+1,yμ+1X and I being the encoding of the initial configuration of U as in [3]. This will be used to obtain all substitutions h(α) where we have h(x0)=h(x#) in LNE(β,β). Now, we construct (α,α). α is defined by

α=x0μ+3x#5x0μ+1x#x0μ+1x#5tvx0α1x0vx0α2x0vt

for some patterns α1 and α2, x0 and x# defined as in β, v=x0x#x#x#x0 and t=ψ(r1β^1r2rμ+1β^μ+1rμ+2) with ψ:XX defined by ψ(x0)=x0, ψ(x#)=x#, and ψ(x)=x0 for all other xvar(β). We define α1 and α2 as α1=τ(α1) and α2=τ(α2) where α1 and α2 are the patterns given in [3, Section 4.4] for the second case.

The most important fact we take from there is that α1=##I##x#06010## for some new and independent variable xX and some encoded initial configuration I. By that, we see that τ(α1) starts with x#x#τ(I)x#x# which is used later on in this adaptation of the proof. The length constraints α are defined by the equations x0=1 and x#=1. Notice that the x in α1 has no length constraint. With that, we can now sketch out why this construction works in the binary case.

Notice, if for some hHα we have h(x0)h(x#), then we immediately obtain essentially the same construction as in [3] and can rely on their proof that shows that these patterns can be used to decide the emptiness of 𝚅𝚊𝚕𝙲U(I). That leaves us with the case h(x0)=h(x#). But then, we notice that h(α1) starts with a unary word of length at least |x#x#τ(I)x#x#|=|I|+4 that has the form h(α1)=h(x#)|I|+4=h(x0)|I|+4. So, we can use β^μ+1 to obtain h(vx0α1x0vx0α2x0v) and use the rest of (β,β) to obtain the surrounding factors just as in [3]. So, if h(x#)=h(x0), then we always have h(α)LNE(β,β). Hence, by the result from [3], there only exists some hHα for which we have h(α)LNE(β,β) if and only if 𝚅𝚊𝚕𝙲U(I), concluding this reduction for the binary case.

To extend this construction to arbitrary alphabets, we can introduce for each additional letter 𝚊Σ a new variable x𝚊 in the patterns α and β, resulting in variables x0,x#,x𝚊1,,x𝚊σ for w.l.o.g. Σ={0,#,𝚊1,,𝚊σ}. With length constraints, we can restrict the length of each substitution on these variables to 1. The construction can be extended in such a way that if any two variables of x0,x#,x𝚊1,,x𝚊σ are substituted by the same letter, then there always exists some β^i which can be used to obtain anything obtained in h(vx0α1x0vx0α2x0v). In a second extension we can ensure that we can also always find some β^i if any of the letters obtained by h(x𝚊1x𝚊σ) appears later on somewhere in h(α1) or h(α2). By that, we essentially can only obtain words not in LNE(β,β), if only the letters obtained in h(x#) and h(x0) are used in h(α1) and h(α2). This restricts us to the same cases as in [3] and allows us to, again, rely on their arguments. As mentioned before, the construction of the extension to arbitrary alphabets is outlined in Appendix D. The full formal proof is given in [27].

 Remark 13.

Indeed, by Corollary 3, Theorem 12 also implies the undecidability of the inclusion problem of terminal-free erasing pattern languages with length constraints. Hence, this is an additional approach to obtain that result other than Theorem 8.

Whether the equivalence problem of non-erasing pattern languages with length constraints is decidable, is left as an open question with no specific conjecture so far. Interestingly, also in the case of regular constraints, no definite answer for the decidability of this problem could be given, yet (cf. [26]). A summary of the current results can be found in Table 1 at the end of the final discussion.

4 Result for Pattern Languages with Regular and Length Constraints

As there are still open problems for pattern languages with regular (or length) constraints, i.e., the equivalence problem for non-erasing pattern languages with regular (or length constraints), finding an answer to this problem for a larger class of languages using both constraint-types is well motivated. Indeed, considering pattern languages with regular and length constraints suffices to obtain the undecidability of that problem. That is noticeable, since this problem is trivially decidable for patterns without any constraints. We begin with mentioning all results for decision problems on pattern languages with regular and length constraints that we can immediately obtain from pervious results.

For the membership, we immediately obtain NP-completeness for all cases, i.e., erasing- and non-erasing as well as terminal-free and general.

Corollary 14.

Let (α,rα,α)𝙿𝚊𝚝Σ,𝒞Reg,Len and wΣ. The decision problem of whether wLX(α,rα,α) for X{E,NE} is NP-complete, even if the considered pattern α is terminal-free.

Indeed, NP-hardness follows, as before, from the NP-hardness of pattern languages without any constraints. NP-containment follows from the fact NP-containment can be checked in the cases of pattern languages with regular constraints as well as pattern languages with length constraints and that no additional properties have to be checked.

As this class of pattern languages utilizes regular constraints, we immediately obtain the property mentioned before, and shown in [26], that we can always find a terminal-free pattern with regular constraints that expresses the same language as a general pattern with regular constraints.

Proposition 15 ([26]).

Let (α,rα)𝙿𝚊𝚝Σ,𝒞Reg be some pattern with regular constraints. Then, there exists some terminal-free pattern βX and a regular constraint rβ such that LX(α,rα)=LX(β,rβ) for X{E,NE}.

Using the same logic as used to show that property, we immediately obtain the same for pattern languages with regular and length constraints, as, given some pattern with regular and length constraints (α,rα,α)𝙿𝚊𝚝Σ,𝒞Reg,Len, one can introduce for each terminal letter 𝚊 a new variable x𝚊 and set L(x𝚊)={𝚊} while not changing any length constraints.

Corollary 16.

Let (α,rα,α)𝙿𝚊𝚝Σ,𝒞Reg,Len be some pattern with regular- and length constraints. Then, there exists some terminal-free pattern βX, a regular constraint rβ, and a length constraint β such that LX(α,rα,β)=LX(β,rβ,β) for X{E,NE}.

Additionally, as the inclusion problem is undecidable for all cases (E, NE, terminal-free, general) for pattern languages with regular or with length constraints, the same follows immediately for pattern languages with regular and length constraints.

Corollary 17.

Let (α,rα,α),(β,rβ,β)𝙿𝚊𝚝Σ,𝒞Reg,Len. It is undecidable to answer whether LX(α,rα,α)LX(β,rβ,β) for X{E,NE} for all alphabets Σ with |Σ|2, even if α and β are restricted to be terminal-free.

The same follows for the equivalence problem of general and terminal-free erasing pattern languages with regular- and length constraints, as also here we have the undecidability for both cases, regular constraints and length constraints, respectively. We obtain the following.

Corollary 18.

Let (α,rα,α),(β,rβ,β)𝙿𝚊𝚝Σ,𝒞Reg,Len. It is undecidable to answer whether LE(α,rα,α)=LE(β,rβ,β) for all alphabets Σ with |Σ|2, even if α and β are restricted to be terminal-free.

For all results obtained so far, no disjunction of systems of linear inequalities have had to be applied. For the equivalence problem of non-erasing pattern languages with regular and length constraints, making use of disjunctions in length constraints in addition to regular constraints allows us to obtain undecidability in this case. The following theorem concludes the third and final main result of this paper.

Theorem 19.

Let (α,rα,α),(β,rβ,β)𝙿𝚊𝚝Σ,𝒞Reg,Len. It is undecidable to answer whether LNE(α,rα,α)=LNE(β,rβ,β) for all alphabets Σ with |Σ|2, even if α and β are restricted to be terminal-free.

Proof.

As for the proofs of Theorem 8 and Theorem 12, due to its length, the full version of this proof can be found in [27]. Here, we will only give a rough sketch of the construction and the idea behind it.

The general idea of the proof works similar to the proof of Theorem 8 and the proof of the main result in [26], but needs significant adaptations to work for non-erasing pattern languages. Similar to the proof of Theorem 8, we reduce the emptiness problem for nondeterministic 2-counter automata to the equivalence problem of non-erasing pattern languages with regular and length constraints. So, given some nondeterministic 2-counter automaton without input A, we construct two patterns with regular and length constraints (α,rα,α) and (β,rβ,β) such that LNE(α,rα,α)=LNE(β,rβ,β) if and only if 𝚅𝚊𝚕𝙲(A)=. In contrast to the proofs of Theorem 8 and Theorem 12, this time, the length constraints β use a disjunction of systems of linear diophantine inequalities while (α,rα,α) has no regular or length constraints at all. Again, (α,rα,α) serves as a pattern to obtain any word with a specific structure and (β,rβ,β) serves as a pattern to obtain words with that specific structure that fulfill given properties. These properties are, again, defined by sub-patterns in (β,rβ,β) that are called predicates. The regular and length constraints rβ and β can be set in such a way that always a single predicate has to be chosen, while the variables outside of these predicates have to be substituted in a very fixed way. This results in words that have equal prefixes and suffixes in comparison to all words obtained from (α,rα,α), while the middle part is then checked with the predicates. These predicates are constructed similar to the ones in [26], just with slight adaptations, as now we have to use non-erasing substitutions instead of erasing ones. In the end, it is shown, that the predicates cover all words where the middle part is not an encoding of a valid computation in 𝚅𝚊𝚕𝙲(A). We continue with the general construction.

As we need β to define α, we will begin with the construction of (β,rβ,β). We define β by

β=r1β^1r2rμβ^μrμ+1

for new and independent variables r1 to rμ+1 for some μ to be specified later. For each i[μ], we define β^i=vγiv for v=0#30 and γi being some terminal-free pattern, also to be defined later. For each i,j[μ] with ij we assume by the following construction that var(γi)var(γj)=ε, i.e., each variable occurring in one γi does not occur anywhere else in the pattern. Also, for any word obtained by γi, we assume that it is either 0|γi| or 0u0 for some uΣ by the construction for this proof. Before giving the constraints rβ and β, we continue with the construction of α. We define α by

α=tv 0α1 0vt2

for a new and independent variable α1, v=0#30, and a word tΣ that is obtained by the following morphism. Define ψ:(Σ×X)Σ by ψ(0)=0, ψ(#)=#, and ψ(x)=0 for all xX. Then, we say that t=ψ(β). The constraints rα and α are empty, hence LNE(α,rα,α)=LNE(α)={tv}Σ+{vt2}. We proceed with the definition of the constraints rβ and β. For each i[μ+1], we define the language of the variable ri by

L(ri)={0,ψ(riβ^iri+1rμβ^μrμ+1),tψ(r1β^1r2ri1β^i1ri)}.

Thus, notice, that each ri may only be substituted to one of 3 words: Either 0, a specific suffix of t or a specific prefix of t2. Additionally, for all variables xvar(β) in β we add the constraint that #L(x), i.e., no variable can be substituted to just the letter #. Clearly, any language can be intersected with another regular language to obtain that constraint. The specific regular and length constraints for each γi will be introduced later in the proof. For the length constraints β, we construct a disjunction of systems of linear (diophantine) inequalities. Assume w.l.o.g. that for each i[μ] we have var(γi):={xi,1,xi,2,,xi,ni} for some ni. Then, we define for each i[μ], for each j[ni] of the resulting ni the system β,i,j defined by:

  • ri=|ψ(riβi^ri+1rμβ^μrμ+1)|

  • ri+1=|tψ(r1β^1r2riβ^iri+1)|

  • rk=1 for all k[μ+1]{i,i+1}

  • xi,j2

  • xk,k=1 for all k[μ] and xkvar(γk) for ki

Additionally, for some β,i,j, there are additional length constraints on the specific variables in occurring in γi. These are defined properly in the formal proof in [27] and do not interfere with the constraint that γi may be substituted to 0|γi| in certain cases. We then define β by β=β,1,1β,1,n1β,2,1β,2,n2β,μ1,1β,μ,nμ. This constraint results in that exactly two subsequent ri and ri+1 are substituted by words longer than the length of 1 while all others have exactly length 1. The other constraints result in that all variables occurring in some pattern γj with ji have to be substituted by length 1 and that at least one variable occurring in γi has to be substituted to length 2. These constraints serve selecting one γi to obtain substitutions of α1 as seen in the following parts.

As in the proof of Theorem 8, first, we have LE(β,β)LNE(α,α). Due to the regular and length constraints, it can be shown that each word wLNE(β,β) has the form w=tv0u0vtt for some uΣ. Hence, by setting h(α1)=u, we always find some hHα such that h(α)=w. On the other hand, due to the regular and length constraints, it can also be shown that the v0u0v part of w must be obtained by some β^i, i[μ], while the prefix t is obtained by r1β^1ri and the suffix tt is obtained by ri+1β^i+1rμ+1. Hence, for some hHα, to find some hHβ such that h(α)=h(β), there must exist some i[μ] such that h(β^i)=h(v0u0v), i.e., h(γi)=h(0α10). We can construct β in such a way that this is only possible, when h(α1) is not an encoding of a valid computation of A. The specific construction of all γi’s is similar to the ones given in [26]. Due to space constraints, see [27] for more details on the construction for this proof.

Following both arguments, we see that there exists some wLNE(α,α) such that wLNE(β,β) if and only if there is some valid computation for A, i.e., 𝚅𝚊𝚕𝙲(A), concluding this result. As mentioned before, see [27] for a full formal proof of this result.

The construction of the proof for Theorem 19 uses patterns that consist partly of terminal-symbols. Using Corollary 16, however, we know that we can always find terminal-free patterns that produce the exact same languages. Hence, the result also follows immediately for terminal-free patterns. This concludes all results regarding decision problems for pattern languages with regular- and length constraints. Notice that by these results all main decision problems (i.e., membership, inclusion, and equivalence) for this class of pattern languages have an answer (summarized in Table 1 at the end of the final discussion).

5 Further Discussion

In this paper, we extended the research regarding decision problems on pattern languages. To get closer to an answer for the main open questions, e.g., the equivalence problem for erasing pattern languages or the larger alphabet cases of the terminal-free inclusion problem for non-erasing pattern languages, extending the research from [26], we now introduced length constraints as an alternative approach. With the results provided in this paper, we see that nearly all problems regarding pattern langauges with length constraints, except membership, are indeed undecidable (in particular the open cases for patterns without constraints). As mentioned at the end of Section 3, similar to [26], the decidability of the equivalence problem for non-erasing pattern languages with length constraints remains an open problem.

By the help of some core ideas from [19, 11, 3, 32], adapted constructions using length (and regular) constraints could be constructed to obtain these results. Regarding our first main result, Theorem 8, similar to [26], the main novelty for the proof is found in the way the constructed checker pattern (β,β) does not produce too much anymore, while still being able to handle everything from (α,α) in a controlled manner. Length constraints helped to reduce the overhead in (β,β) to enable the approaches from the inclusion problem proof to be used for the equivalence problem. Now, due to such a construction being shown to actually work for two different constraint types, e.g., regular and length constraints respectively, finding a way to adapt the existing proofs for pattern languages without any constraints in such a way that the same can be achieved here, poses a final challenge. If this can actually be done, the main open problem for pattern languages would be settled. So far, we do not know how likely such a construction is to be found. In addition, the use of length constraints also helped to achieve undecidability for decidable problems regarding patterns without constraints, i.e., the terminal-free inclusion and equivalence problems for erasing pattern languages. Hence, the expressiveness obtained using these constraints and, thus, needed to obtain undecidability of the equivalence problem for erasing pattern languages using the approach posed in this paper, might actually be unfeasible. Regarding the terminal-free inclusion problem in the non-erasing case, due to the undecidability of the binary case having already been shown for patterns without any constraints [32], it is not unlikely that undecidability for larger alphabets can be achieved. Using the approach proposed in the current paper, we would need to somehow be able to restrict substitutions of some variables to single letters and distinct symbols in one case, and enable certain wildcards in any other case, similar to [32], where a property of binary words is used to flip a switch, determining where a word obtained from α has to be obtained in β. So far, this remains open.

Regarding the equivalence problem for non-erasing pattern languages for patterns with length constraints, we observe that this problem must be harder than the trivially decidable variant for non-erasing pattern languages without any constraints. This results from the fact that there are several NP-hard problems that are expressible solely by unions of linear diophantine equations. Some of which are, e.g., the subset sum problem, the knapsack problem, or integer programming [20]. Just to give a rough idea, we can easily embed, say, a positive integer subset sum instance (S,T) with S={S1,,Sn}, for Si with i[n], and T to an instance of the equivalence problem for non-erasing pattern languages with length constraints by setting the pattern α=x1xn, for variables x1,,xnX, defining α by setting (xi=Si+1)(xi=1), for i[n], and setting i[n]xi=T+n. Then, we set β=y, for a variable yX, with β being defined by y=T+n. Then, LNE(β,β) only has words of length T+n and LNE(α,α) only contains some words if (S,T) actually has a solution, as otherwise we cannot find a valid substitution for α. If it contains words, they have length T+n. So, both languages are equal if and only if there exists a solution for (S,T). Here, disjunctions allowed to embed the choice of setting any xi to either length 1 or length Si+1. Hence, we obtain the following.

Proposition 20.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len and let Σ be any alphabet with at least |Σ|1. Deciding whether we have LNE(α,α)=LNE(β,β) is NP-hard, even when we are restricted to terminal-free patterns α and β.

Notice that the previous result depended on the fact that we allow disjunctions in the given length constraints. The following result shows that at least in the case of unary alphabets, where we basically restrict the problem to a number setting, we can obtain NP-hardness without using disjunctions by a reduction from the the 𝟹𝚂𝙰𝚃 problem. Due to length restrictions, the proof of Proposition 21 can be found in Appendix E.

Proposition 21.

Let (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len where α and β each use no disjunctions. Deciding whether we have LNE(α,α)=LNE(β,β) is NP-hard for alphabets Σ with |Σ|=1, even when we are restricted to terminal-free patterns α and β.

For all other problems, we obtained an answer and notice that the most prominent open problems, i.e., the equivalence problem for erasing pattern languages and the inclusion problem for terminal-free non-erasing pattern languages for alphabets Σ with |Σ|3 are indeed undecidable for patterns with length constraints. For patterns with regular and length constraints, we have even seen the undecidability of the equivalence problem of non-erasing pattern languages, concluding all open problems there. A final overview of the current state of results for decision problems on patterns over various constraints can be found in Table 1. We propose the following open question for which we have no definite conjecture so far.

Question 1.

Given (α,α),(β,β)𝙿𝚊𝚝Σ,𝒞Len, is it generally decidable to answer whether LNE(α,α)=LNE(β,β)?

Table 1: Summary of the current state of results regarding the main decision problems for pattern languages with different constraints (NPC = NP-Complete, UD = Undecidable, E = Erasing, NE = Non-Erasing, T.F. = Terminal-Free, Gen. = General, () means membership problem, () means inclusion problem, and (=) means equivalence problem.). The results for No Constraints and Regular Constraints are based on previous research mentioned in the introduction. The results for Length Constraints as well as Regular and Length Constraints summarize the results of this paper.
No Constraints Len-Constraints Reg-Constraints RegLen-Constraints
 Gen.  T.F.  Gen.  T.F.  Gen.  T.F.  Gen.  T.F.
 E ()  NPC  NPC  NPC  NPC  NPC  NPC  NPC  NPC
 E ()  UD  NPC  UD  UD  UD  UD  UD  UD
 E (=) Open  NPC  UD  UD  UD  UD  UD  UD
 NE ()  NPC  NPC  NPC  NPC  NPC  NPC  NPC  NPC
 NE ()  UD  UD222Undecidable in the binary case by [32], open for larger alphabets.  UD  UD  UD  UD  UD  UD
 NE (=)  P  P Open3 Open333Gen. and T.F. NP-hard as described above. An upper bound is unknown (may as well be undecidable). Open4 Open444Depending on regular languages representation, at least PSPACE-hard [26].  UD  UD

References

  • [1] Dana Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46–62, 1980. doi:10.1016/0022-0000(80)90041-0.
  • [2] Pablo Barceló, Leonid Libkin, Anthony W. Lin, and Peter T. Wood. Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst., 37(4), December 2012. doi:10.1145/2389241.2389250.
  • [3] Joachim Bremer and Dominik D. Freydenberger. Inclusion problems for patterns with a bounded number of variables. Information and Computation, 220-221:15–43, 2012. doi:10.1016/J.IC.2012.10.003.
  • [4] Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. Local Patterns. In Satya Lokam and R. Ramanujam, editors, FSTTCS 2017, volume 93 of LIPIcs, pages 24:1–24:14, Dagstuhl, Germany, 2018. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPICS.FSTTCS.2017.24.
  • [5] Andrzej Ehrenfeucht and Grzegorz Rozenberg. Finding a homomorphism between two words is NP-complete. Inf. Process. Lett., 9(2):86–88, 1979. doi:10.1016/0020-0190(79)90135-2.
  • [6] Henning Fernau, Florin Manea, Robert Mercaş, and Markus L. Schmid. Revisiting Shinohara’s algorithm for computing descriptive patterns. TCS, 733:44–54, 2018. Special Issue on Learning Theory and Complexity. doi:10.1016/J.TCS.2018.04.035.
  • [7] Pamela Fleischmann, Sungmin Kim, Tore Koß, Florin Manea, Dirk Nowotka, Stefan Siemer, and Max Wiedenhöft. Matching patterns with variables under Simon’s congruence. In Olivier Bournez, Enrico Formenti, and Igor Potapov, editors, Reachability Problems, pages 155–170, Cham, 2023. Springer Nature Switzerland. doi:10.1007/978-3-031-45286-4_12.
  • [8] Dominik D. Freydenberger. A logic for document spanners. Theory of Computing Systems, 63(7):1679–1754, September 2018. doi:10.1007/S00224-018-9874-1.
  • [9] Dominik D. Freydenberger and Mario Holldack. Document spanners: From expressive power to decision problems. Theory of Computing Systems, 62(4):854–898, May 2017. doi:10.1007/S00224-017-9770-0.
  • [10] Dominik D. Freydenberger and Liat Peterfreund. The theory of concatenation over finite models. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, ICALP 2021, Proceedings, volume 198 of LIPIcs, pages 130:1–130:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ICALP.2021.130.
  • [11] Dominik D. Freydenberger and D. Reidenbach. Bad news on decision problems for patterns. Information and Computation, 208(1):83–96, January 2010. doi:10.1016/J.IC.2009.04.002.
  • [12] Dominik D. Freydenberger and Markus L. Schmid. Deterministic regular expressions with back-references. Journal of Computer and System Sciences, 105:1–39, 2019. doi:10.1016/J.JCSS.2019.04.001.
  • [13] Dominik D. Freydenberger and Nicole Schweikardt. Expressiveness and static analysis of extended conjunctive regular path queries. Journal of Computer and System Sciences, 79(6):892–909, 2013. JCSS Foundations of Data Management. doi:10.1016/J.JCSS.2013.01.008.
  • [14] Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching Patterns with Variables Under Hamming Distance. In Filippo Bonchi and Simon J. Puglisi, editors, MFCS 2021, volume 202 of Leibniz International Proceedings in Informatics (LIPIcs), pages 48:1–48:24, Dagstuhl, Germany, 2021. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPICS.MFCS.2021.48.
  • [15] Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching patterns with variables under edit distance. In Diego Arroyuelo and Barbara Poblete, editors, String Processing and Information Retrieval, pages 275–289, Cham, 2022. Springer International Publishing.
  • [16] Michael Geilke and Sandra Zilles. Learning relational patterns. In Jyrki Kivinen, Csaba Szepesvári, Esko Ukkonen, and Thomas Zeugmann, editors, Algorithmic Learning Theory, pages 84–98, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. doi:10.1007/978-3-642-24412-4_10.
  • [17] Oscar H. Ibarra. Reversal-bounded multicounter machines and their decision problems. J. ACM, 25(1):116–133, January 1978. doi:10.1145/322047.322058.
  • [18] Tao Jiang, Efim Kinber, Arto Salomaa, Kai Salomaa, and Sheng Yu. Pattern languages with and without erasing. International Journal of Computer Mathematics, 50(3-4):147–163, 1994.
  • [19] Tao Jiang, Arto Salomaa, Kai Salomaa, and Sheng Yu. Decision problems for patterns. Journal of Computer and System Sciences, 50(1):53–63, February 1995. doi:10.1006/JCSS.1995.1006.
  • [20] Richard M. Karp. Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. doi:10.1007/978-1-4684-2001-2_9.
  • [21] Takeshi Koshiba. Typed pattern languages and their learnability. In Paul Vitányi, editor, Computational Learning Theory, pages 367–379, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg. doi:10.1007/3-540-59119-2_192.
  • [22] Anthony Widjaja Lin and Rupak Majumdar. Quadratic word equations with length constraints, counter systems, and presburger arithmetic with divisibility. In Automated Technology for Verification and Analysis, 2018.
  • [23] M. Lothaire. Combinatorics on Words. Cambridge Mathematical Library. Cambridge University Press, 2 edition, 1997.
  • [24] Marvin L. Minsky. Recursive unsolvability of Post’s problem of "tag" and other topics in theory of Turing machines. Annals of Mathematics, 74(3):437–455, 1961.
  • [25] Turlough Neary and Damien Woods. Four small universal turing machines. Fundam. Inf., 91(1):123–144, January 2009. doi:10.3233/FI-2009-0036.
  • [26] Dirk Nowotka and Max Wiedenhöft. The equivalence problem of E-pattern languages with regular constraints is undecidable. In Szilárd Zsolt Fazekas, editor, Implementation and Application of Automata, pages 276–288, Cham, 2024. Springer Nature Switzerland. doi:10.1007/978-3-031-71112-1_20.
  • [27] Dirk Nowotka and Max Wiedenhöft. The equivalence problem of E-pattern languages with length constraints is undecidable, 2025. doi:10.48550/arXiv.2411.06904.
  • [28] Enno Ohlebusch and Esko Ukkonen. On the equivalence problem for E-pattern languages. In MFCS 1996, pages 457–468. Springer Berlin Heidelberg, 1996. doi:10.1007/3-540-61550-4_170.
  • [29] Daniel Reidenbach. On the equivalence problem for E-pattern languages over small alphabets. In DLT, pages 368–380. Springer Berlin Heidelberg, 2004. doi:10.1007/978-3-540-30550-7_31.
  • [30] Daniel Reidenbach. On the learnability of E-pattern languages over small alphabets. In Learning Theory, pages 140–154. Springer Berlin Heidelberg, 2004. doi:10.1007/978-3-540-27819-1_10.
  • [31] Daniel Reidenbach. An examination of Ohlebusch and Ukkonen’s conjecture on the equivalence problem for E-pattern languages. J. Autom. Lang. Comb., 12(3):407–426, January 2007. doi:10.25596/JALC-2007-407.
  • [32] Aleksi Saarela. Hardness Results for Constant-Free Pattern Languages and Word Equations. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 140:1–140:15, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPICS.ICALP.2020.140.
  • [33] Markus L. Schmid. On the membership problem for pattern languages and related topics. PhD thesis, Loughborough University, January 2012. URL: https://repository.lboro.ac.uk/articles/thesis/On_the_membership_problem_for_pattern_languages_and_related_topics/9407606.
  • [34] Markus L. Schmid and Nicole Schweikardt. Document spanners - A brief overview of concepts, results, and recent developments. In PODS ’22: International Conference on Management of Data, pages 139–150. ACM, 2022. doi:10.1145/3517804.3526069.
  • [35] Takeshi Shinohara. Polynomial time inference of extended regular pattern languages, pages 115–127. Springer Berlin Heidelberg, 1983.
  • [36] Takeshi Shinohara and Setsuo Arikawa. Pattern inference, pages 259–291. Springer Berlin Heidelberg, 1995. doi:10.1007/3-540-60217-8_13.

Appendix A Definition of Nondeterministic 2-Counter Automata without Input

A nondeterministic 2-counter automaton without input (see e.g. [17]) is a 4-tuple A=(Q,δ,q0,F) which consists of a set of states Q, a transition function δ:Q×{0,1}2𝒫(Q×{1,0,+1}2), an initial state q0Q, and a set of accepting states FQ. A configuration of A is defined as a triple (q,m1,m2)Q×× in which q indicates the current state and m1 and m2 indicate the contents of the first and second counter. We define the relation A on Q×× by δ as follows. For two configurations (p,m1,m2) and (q,n1,n2) we say that (p,m1,m2)A(q,n1,n2) if and only if there exist c1,c2{0,1} and r1,r2{1,0,+1} such that

  1. 1.

    if mi=0 then ci=0, otherwise if mi>0, then ci=1, for i{1,2},

  2. 2.

    ni=mi+ri for i{1,2},

  3. 3.

    (q,r1,r2)δ(p,c1,c2), and

  4. 4.

    we assume if ci=0 then ri1 for i{1,2}.

Essentially, the machine checks in every state whether the counters equal 0 and then changes the value of each counter by at most one per transition before entering a new state. A computation is a sequence of configurations. An accepting computation of A is a sequence C1,,Cn(Q××)n with C1=(q0,0,0), CiACi+1 for all i{1,,n1}, and CnF×× for some n with n>0.

We encode configurations of A by assuming Q={q0,,qe} for some e and defining a function enc : Q××{0,#} by

enc(qi,m1,m2):=0x+i#0c1+y2m1#0c2+y2m2

for some numbers x,c1,y2,c1,y2. The values for these numbers depend on the construction of the respective proofs and are not specified here. Encodings of this kind are used to prove Theorem 8 and Theorem 19. This is extended to encodings of computations by defining for every n1 and every sequence C1,,CnQ××

enc(C1,,Cn):=##enc(C1)####enc(Cn)##.

For some nondeterministic 2-counter automaton without input A, define the set of encodings of accepting computations

𝚅𝚊𝚕𝙲(A):={enc(C1,,Cn)|C1,,Cn is an accepting computation of A}

and let 𝙸𝚗𝚟𝚊𝚕𝙲(A)={0,#}𝚅𝚊𝚕𝙲(A). The emptiness problem for deterministic 2-counter-automata is undecidable (cf. e.g. [17, 24]), thus it is also undecidable whether a nondeterministic 2-counter automaton without input has an accepting computation [11, 19]. That the emptiness problem for universal Turing machines is undecidable is a known fact.

Appendix B Definition of the Universal Turing Machine 𝑼

Here, we define the universal Turing machine U used in the referenced proof from [3] and referred to in the proof sketch of Theorem 12. Let U=(Q,Γ,δ) be the universal Turing machine U15,2 wih 2 symbols and 15 states as described by Neary and Woods [25]. This machine hat the states Q={q1,,q15} and the tape alphabet Γ={0,1}. The transition function δ:Γ×Q(Γ×{L,R}×Q)𝙷𝙰𝙻𝚃 is given in Table 2.

We follow with the definition of encodings of computations of U as depicted in [3]. The following conventions are needed to discuss configurations of U. The tape content of any configuration of U is characterized by two infinite sequences tL=(tL,n)n0 and tR=(rR,n)n0 over Γ. The sequence tL describes the left side of the tape, the sequence starting at the head position of U (including) and extending to the left. Analogously, tR describes the right side of the tape, the sequence starting directly after the head position and extending to the right. A configuration C=(qi,tL,tR) of U is a triple consisting of a state qi, a left side of the tape tL and a right side of the tape tR.

Let e:ΓN be a function defined by e(0):=0, e(1):=1, and the extension to to infinite sequences t=(tn)n0 over Γ by e(t):=i=0e(ti). As in each configuration of U only a finite number of cells consist of no blank symbol (0), e(t) is always finite and well-defined. Notice that we can always obtain the symbol that is closest to the head by e(t)mod2 (the symbol at the head position in the case of tL and the symbol right of the head position in the case of tR). By multiplying or dividing the encoding e(t) by 2, each side can be lengthened or shortened, respectively. The encoding of configurations of U indirectly referred to in this paper is defined by

encNE(qi,tL,tR)=070e(tR)#070e(tL)#0i+6

for every configuration (qi,tL,tR). Recall that i>0 as qi{q1,,q15}. A computation 𝒞=(C1,,Cn) on U is a finite sequence of configurations of U. It is valid if C1=I (I being some initial configuration), Cn is a halting configuration, and Ci+1 is a valid successor configuration of Ci, for i[n1], as defined by δ. In [3], the notion is adopted that any possible configuration where both tape sides have a finite value under e is a valid successor configuration of a halting configuration. The encoding of computations of U is given analogously to the definition in the case of nondeterministic 2-counter automata without input, i.e., for some computation 𝒞=(C1,,cn), we have

encNE(𝒞)=##encNE(C1)##encNE(c2)####encNE(Cn)##.

Finally, also analogous to nondeterministic 2-counter automata without input, let

𝚅𝚊𝚕𝙲U(I)={encNE(𝒞)𝒞 is a valid computation from I}.
Table 2: Transition table of U, i.e., definition of δ, as it is given in [3] or [25].
q1 q2 q3 q4 q5 q6 q7 q8
0 (0,R,q2) (1,R,q3) (0,L,q7) (0,L,q6) (1,R,q1) (1,L,q4) (0,L,q8) (1,L,q9)
1 (1,R,q1) (1,R,q1) (0,L,q5) (1,L,q5) (1,L,q4) (1,L,q4) (1,L,q7) (1,L,q7)
q9 q10 q11 q12 q13 q14 q15
0 (0,R,q1) (1,L,q11) (0,R,q12) (0,R,q13) (0,L,q2) (0,L,q3) (0,R,q14)
1 (1,L,q10) 𝙷𝙰𝙻𝚃 (1,R,q14) (1,R,q12) (1,R,q12) (0,R,q15) (1,R,q14)

Appendix C Extension to Larger Alphabets in Theorem 8

To extend the proof idea for Theorem 8 to arbitrary larger alphabets Σ with |Σ|>2, we fix w.l.o.g. Σ={0,#,𝚊1,𝚊2,,𝚊σ} (hence, σ1). We can construct adapted patterns with length constraints (ασ,ασ) and (βσ,βσ) in the following manner. The idea of this adapted construction is also similar to [11], but adapted to this setting and for terminal-free patterns. First, we define ασ by

ασ=y1xvα1xvα2xvy2xvz(z)2(z𝚊1)2(z𝚊2)2(z𝚊σ)2zxv.

Notice, that only the parts containing the z’s is changed. We construct the adapted length constraints ασ in the following manner:

  • xv=5, z=1, z=1,

  • z𝚊i=1 for all i[σ]

An immediate observation that can be made, as its done in [11], is the following:

Observation 22 ([11]).

Let n1, x1,,xnX, and 𝚊1,𝚊2,,𝚊nΣ. Consider the pattern

p=x1x2x2x3x3xnxnx1.

Then, for each homomorphism h with h(p)=𝚊1𝚊2𝚊2𝚊3𝚊3𝚊n𝚊n𝚊1 we have that h(xi)=𝚊i for all i[n].

Now, we explain the changes necessary to create βσ and βσ. First, for each ηi with i[μ], we change its form from ηi=zizizizi to

ηi=zizizizi,𝚊1zi,𝚊1zi,𝚊σzi,𝚊σzi.

Now, instead of adding ημ+1 as before, we add a series of n many pairs of patterns β^j and β¨j, for j[μ+n][μ], that cover the cases that two different z variables are substituted equally in ασ. In each of these newly defined pair of patterns, we would set γi and δi just to single, new, and independent variables to obtain any substitution h(α1) and h(α2). The number of those additional pairs of patterns rises significantly for alphabets of larger size, hence we only give an example here.

Consider the case that h(z)=h(z𝚊2). To handle this case, we would add a predicate πμ+k, for some k[n], such that

ημ+k=zμ+k(zμ+k)2(zμ+k,𝚊1)2(zμ+k)2(zμ+k,𝚊3)2(zμ+k,𝚊σ)2zμ+k

and γμ+k=yμ+k as well as δμ+k=yμ+k, where each new variable is new and independent from other parts of the pattern. In particular, notice, that in this example we have no occurrence of the variable zμ+k,𝚊2 and instead four occurrences of zμ+k. The idea is similar to the previous construction of ημ+1 in the binary case to handle the case of h(z)=h(z).

We observe, that if, for some substitution hHασ, any two different variables with the base name z are substituted equally, then there exists one pair of patterns β^i and β¨i that have the same variables equalled out, hence resulting in the existence of some hHβσ for which h(ασ)=h(βσ).

Additionally, we add 2σ many pairs of patterns β^j and β¨j, for μ+n<jμ+n+2σ that handle the occurrence of any of the letters h(z𝚊1) to h(z𝚊σ) in h(α1) or h(α2). These predicates work exactly as in [11], just with the same addition as before that h(z) and h(z) determine the letters that need to be used for the encodings of valid computations and for the substitutions of xv. All other arguments stay the same. Hence, we conclude the sketch of the extension to larger alphabets for Theorem 8.

Appendix D Extension to Larger Alphabets in Theorem 12

To extend the proof idea for Theorem 12 to arbitrary larger alphabets Σ with |Σ|>2, we fix w.l.o.g. Σ={0,#,𝚊1,𝚊2,,𝚊σ} (hence, σ1). The basic idea stays the same, but the adaptations to the constructed patterns (α,α) and (β,β) are more substantial.

For each new letter 𝚊i, we introduce a new variable x𝚊i that may be substituted by only a single letter. Similar to the extension to arbitrary alphabets in Theorem 12, the patterns (α,α) and (β,β) are adapted in such a way that two additional cases are considered for. First, if some substitution hHα replaces two distinct variables in {x0,x#,x𝚊1,,x𝚊σ} to the same letter, i.e., h(x𝚊)=h(x𝚋) for x𝚊,x𝚋{x0,x#,x𝚊1,,x𝚊σ}, then we should always find a substitution hHβ such that h(α)=h(β). Next, we make sure that only the letters h(x#) and h(x0) may appear in h(α1) and h(α2). Otherwise, we should also always find some substitution hHβ such that h(α)=h(β).

We continue with the specific adapted constructions. First, each β^i is redefined to use the structure β^i=x0x#4x0𝐱𝟎𝐱#𝐱𝚊𝟏𝐱𝚊𝟐𝐱𝚊σγ𝐢x0x#4x0𝐱𝟎𝐱#𝐱𝚊𝟏𝐱𝚊𝟐𝐱𝚊σδix0x#4x0. Notice that we just add each variable in {x0,x#,x𝚊1,,x𝚊σ} in front of each γi and δi, for i[μ+1]. Additionally, we redefine β by adding x0x#x𝚊1x𝚊2x𝚊σ in front of it. The adapted length constraints of β look almost identical, but with the addition of constraining the length of all variables in {x0,x#,x𝚊1,,x𝚊σ} to 1 and changing the sum of xa+xb to correspond to the new number of β^i’s that is explained in the following. In particular, we have x0=1, x#=1, xa+xb=μ+1+2σ+n+1, x𝚊i=1 for all i[σ], and xi=1 for all i[μ+1+2σ+n].

To define the adaptation of (α,α), we need to extend the morphism ϕ to take into account the newly defined variables. Simply, for all variables xX with x{x𝚊1,,x𝚊σ}, ϕ is defined just as in the binary case. For all other x𝚊i{x𝚊1,,x𝚊σ}, we define ϕ(x)=x𝚊i. Now, we add the prefix x0x#x𝚊1x𝚊2x𝚊σ to both, α1 and α2. Finally, we also add the prefix x0x#x𝚊1x𝚊2x𝚊σ to the whole of α. We add to the length constraints α that we also have x𝚊i=1, for all i[σ]. We continue with the construction of all β^i to cover the additional two emerging cases. Due to space constraints, all formal proofs regarding the correctness of this extension to the reduction can be found in [27].

First, for each i[σ], we can add two β^, i.e., β^μ+1+i and β^μ+1+σ+i, which cover the case that any letter obtained by {x𝚊1,,x𝚊σ} occurs somewhere later on in h(α1) or h(α2), for hHα. For β^μ+1+i, we define γμ+1+i and δμ+1+i just by

  • γμ+1+i=x0x0x#x𝚊1x𝚊2x𝚊σyμ+1+i,1x𝚊iyμ+1+i,2

  • δμ+1+i=x0x0x#x𝚊1x𝚊2x𝚊σyμ+1+i

for new and independent variables yμ+1+i,1,yμ+1+i,2,yμ+1+i. We can define β^μ+1+σ+i analogously, just by swapping the role of γ and δ.

Finally, we need to cover the case that h(x𝚊)=h(x𝚋), for some hHα,𝚊,𝚋{x0,x#,x𝚊1,,x𝚊σ}. Assume these variables to be x𝚊i and x𝚊j, for some i<j. Then, we just define β^μ+1+2σ+k, for k[n] (n being the total number of combinations), by setting γ und δ to

  • γμ+1+2σ+k=x0x0x#x𝚊1x𝚊ix𝚊j1x𝚊ix𝚊j+1x𝚊σyμ+1+2σ+k, and

  • δμ+1+2σ+k=x0x0x#x𝚊1x𝚊ix𝚊j1x𝚊ix𝚊j+1x𝚊σyμ+1+2σ+k.

So, if and only if the same symbol is used twice in the prefix x0x#x𝚊1x𝚊2x𝚊σ, then we can use these β^s to find some hHβ such that h(α)=h(β).

By the previous two constructions, all additional cases that emerge due to the larger alphabet are captured by these additional β^s, so they cannot result in substitutions hHα with h(α)LNE(β,β). Hence, only analogous cases to the binary construction can result in words that are not in LNE(β,β), concluding the sketch of the extension to larger alphabets.

Appendix E NP-hardness of the Unary Case for the Equivalence Problem for Non-Erasing Pattern Languages with Length Constraints

This section is dedicated to show Proposition 21. W.l.o.g. assume Σ={0}. We proceed with a reduction from 𝟹𝚂𝙰𝚃. Assume w.l.o.g. 𝒳={X1,X2,} to be a set of boolean variables and let 𝒳¯={X1¯,X2¯,} be the set of their negations. Let φ=(φ1,φ2,,φn) for n with n2 be a 𝟹𝚂𝙰𝚃 formula in conjunctive normal form over 𝒳𝒳¯ such that for each clause φi with i[n] we have w.l.o.g. φi=(Xi,1Xi,2Xi,3) with Xi,j𝒳𝒳¯ for j{1,2,3}. We define a function f:𝒳𝒳¯X that maps each boolean variable to a pattern variable by

f(X)={ui, if x𝒳 and x=Xivi, if x𝒳¯ and x=Xi¯

for new and independent pattern variables ui and vi. Clearly, one of the two cases is always fulfilled for each boolean variable X𝒳𝒳¯. We proceed by defining the pattern (α,α) by

α=f(X1,1)f(X1,2)f(X1,3)y1f(X2,1)f(X2,2)f(X2,3)y2f(Xn,1)f(Xn,2)f(Xn,3)yn

for new and independent variables yiX with i[n]. Notice that this pattern is terminal-free. The length constraints α are defined by the following system:

  • ui+vi=3 for all i[n]

  • yi3 for all i[n]

  • f(Xi,1)+f(Xi,2)+f(Xi,3)+yi=7 for all i[n]

  • i=1n(f(Xi,1)+f(Xi,2)+f(Xi,3)+yi)=7n

Notice, that the final constraint results in that LNE(α,α) may only have the word 07n in it, as it restricts the length of all symbols in α together. Now, we set the second pattern with length constraints (β,β) by

β=z

for a new and independent variable zX and define the length constraint β by z=07n. Hence, LNE(β,β)={07n}.

To proof the reduction, first, assume there exists a satisfying assignment of variables ϕ for φ. Let h be a substitution such that h(ui)=00 and h(vi)=0 if and only if ϕ(Xi)=true and ϕ(Xi¯)=false. Otherwise, set h(ui)=0 and h(vi)=00. As ϕ is satisfying φ, we know that 4|h(f(Xi,1)f(Xi,2)f(Xi,3))|6 for all i[n]. Hence, we can always set h(yi)=07|h(f(Xi,1)f(Xi,2)f(Xi,3))| and obtain |h(f(Xi,1)f(Xi,2)f(Xi,3)yi)|=7 for all i[n]. Hence, we h must be α-valid and we have h(α)=07n. As this is the only word we may obtain for LNE(α,α), we have LNE(α,α)=LNE(β,β).

For the other direction, assume LNE(α,α)=LNE(β,β)={07n}. So, there exists some hHα such that h(α)=07n. By α, we know h(f(Xi,1)f(Xi,2)f(Xi,3)yi)=7 for all i[n]. As |h(yi)|3 must be the case, for each i[n], there exists some j{1,2,3} such that |h(f(Xi,j))|>1, hence by α, we must have |h(f(Xi,j))|=2, resulting in h(f(Xi,j))=00. We set an assignment of variables ϕ by Xi=true and Xi¯=false if and only if h(ui)=00 and h(xi)=0, otherwise set Xi=false and Xi¯=true. As |h(ui)|+|h(vi)|=3, we know such an assignment is a valid assignment for φ. As for each clause ϕi for i[n] we find a variable that is set to true, we know that φ is satisfied by ϕ, concluding the reduction.