A Tropical Approach to the Compositional Piecewise Complexity of Words and Compressed Words

Schnoebelen, Philippe; Veron, J.; Vialard, Isa

doi:10.4230/LIPIcs.ICALP.2025.171

A Tropical Approach to the Compositional Piecewise Complexity of Words and Compressed Words

Philippe Schnoebelen

Université Paris-Saclay, CNRS, ENS Paris-Saclay, Laboratoire Méthodes Formelles, Gif-sur-Yvette, France J. Veron Université Paris-Saclay, CNRS, ENS Paris-Saclay, Laboratoire Méthodes Formelles, Gif-sur-Yvette, France Isa Vialard

Max Planck Institute for Software Systems, Saarland Informatics Campus, Saarbrücken, Germany

Abstract

We express the piecewise complexity of words using tools and concepts from tropical algebra. This allows us to define a notion of piecewise signature of a word that has size $\log(n)m^{O(1)}$ where $m$ is the alphabet size and $n$ is the length of the word. The piecewise signature of a concatenation can be computed from the signatures of its components, allowing a polynomial-time algorithm for computing the piecewise complexity of SLP-compressed words.

Keywords and phrases:

Tropical semiring, Subwords and subsequences, piecewise complexity, SLP-compressed words

Category:

Track B: Automata, Logic, Semantics, and Theory of Programming

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Formal languages and automata theory

Acknowledgements:

Work started while the third author was at Univ. Paris-Saclay.

DOI:

10.4230/LIPIcs.ICALP.2025.171

Event:

52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Editors:

Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele Puppis

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Subwords and piecewise complexity

For two words $u, v$ , we say that $u$ is a subword of $v$ , written $u\preccurlyeq v$ , if $u$ can be obtained by removing some letters (possibly all, possibly none) from $v$ at arbitrary positions. For example, $\mathtt{ABBA}\preccurlyeq\mathtt{ABRACADABRA}$ . This notion corresponds to subsequences in other areas, and is more general than the usual concept of factors of a word.¹¹1Some authors use the terminology “subword” for factors, and then use “scattered subword” for subwords.

Compared to factors, subwords are not so easy to reason about, and correspondingly they have been much less studied. However, they certainly are a fundamental notion, appearing naturally in many areas of formal languages, algorithmics, data sciences, logic, computational linguistics, genomics, etc.

In language theory, semigroup theory, and logic, an important concept is piecewise-testability, introduced in 1972 by Imre Simon [27]. This corresponds exactly to the first level in Straubing-Thérien hierarchy of first-order definable languages [6] and has been extended in many directions (infinite words [4], trees [2], any well quasi-ordered set [11]).

In this paper, we are interested in subword-based descriptive complexity of words. In particular, the piecewise-complexity of words and languages has been used in [15] for bounding the complexity of decidable fragments of the logic of subwords (see also [12]). In essence, the piecewise-complexity of a word (or any piecewise-testable object) is the minimum size of the pieces required to specify it. Regarding this measure, a recent algorithmic contribution is [26], providing efficient algorithms for computing the piecewise-complexity of individual words. The algorithms and the reasoning behind them are quite involved and it is not trivial to extend them for, e.g., periodic words [23].

It turns out that many of the computations performed in [26] have a simple description in the framework of tropical algebra, i.e., the $(\min,+)$ -semiring. Indeed, some fundamental operations of [26] are just linear transformations in a tropical sense. This realisation is the starting point for this paper.

Our contribution

In a first part, we rephrase the results of [26] in the language and notation of tropical algebra. This setting makes it easier to develop a compositional way of computing piecewise-complexity: we associate with any word $u$ a (piecewise) signature $\sigma(u)$ that has size logarithmic in $|u|$ (and polynomial in the size of the alphabet), and that contains enough information to allow extracting, among other things, the piecewise complexity of $u$ . Furthermore, signatures can be computed compositionally, i.e., $\sigma(uv)$ is obtained in polynomial-time from $\sigma(u)$ and $\sigma(v)$ .

In a second part we use this apparatus to compute the piecewise complexity of compressed words in time polynomial in the size of their compressed description, solving a problem raised in [25]. Reasoning on (e.g., searching) and handling (e.g., editing) compressed data structures is an important field of algorithmics and compressed words are one of the most fundamental and most fruitful instantiations of the paradigm, see [17]. While our algorithm is a simple derivation of the results from the first part, its complexity analysis is quite involved and is the main technical difficulty in this paper. We expect that the techniques behind piecewise signatures can be useful for other subword-related problems on compressed words.

We hope to convince our readers that our contribution is made more elegant and technically simpler through the use of tropical linear and multilinear algebra. This allows one to recruit familiar notions of vectors, matrices, their products, their monotonicity, thus lightening the brain load required for following through our reasoning. Tropical algebra was introduced in the area of formal languages and automata by Imre Simon himself (in [29], see also [21]) in connection with the star-height problem, opening the way to the algebraic and algorithmic theory of weighted automata, now a very active area [7, 18]. In this respect it is surprising that Simon did not use tropical algebra in his fundamental work on piecewise-testable languages [27, 28] on which we heavily rely and whose bases we rephrase tropically.

Related work

Related to piecewise-complexity of words is the piecewise-distance $\delta(u,v)$ of two words, which has been more studied and is the topic of several recent papers [8, 1, 10]. Indeed the piecewise-complexity of $u$ can be defined as $1+\max_{v\neq u}\delta(u,v)$ [26] but this characterisation does not provide useful algorithms. The range of related questions include reconstructing words from their subwords [16], solving word equations [20], or computing edit-distance [20, 5], etc., modulo Simon’s congruence.

Outline of the paper

Section 2 recalls the basic concepts, notations, and results used in the paper. Section 3, the main conceptual contribution, shows how to use tropical algebra for computing piecewise complexity compositionally, introduces piecewise signatures with their associated algorithms. Section 4 applies this to SLP-compressed words. Finally, Section 5 handles the complexity analysis and contains the main technical difficulties. For readability concerns, some proofs are omitted from the main text and can be found in the Appendix.

2 Basic notions

2.1 Subwords and the piecewise complexity of words

We follow [26]. For a word $u=a_{1}a_{2}\cdots a_{n}$ of length $|u|=n$ and two positions $i, j$ such that $0\leq i\leq j\leq n$ , we write $u(i,j)$ for the factor $a_{i+1}a_{i+2}\cdots a_{j}$ . Note that $u(0,|u|)=u$ , that $u(i_{1},i_{2})\cdot u(i_{2},i_{3})=u(i_{1},i_{3})$ , and that $|u(i,j)|=j-i$ . We write $u(i)$ as shorthand for $u(i-1,i)$ , i.e., $a_{i}$ , the $i$ th letter of $u$ . Finally, we use $u(i,)$ and $u(,j)$ as convenient notations for $u(i,n)$ and, respectively, $u(0,j)$ : they are the $i$ th prefix and the $j$ th suffix of $u$ , with the understanding that this numbering starts with a “ $0$ th” prefix, always equal to the empty word $\epsilon$ , and that suffixes get shorter and shorter when their index grow while the length of prefixes increases.

Fix a $m$ -letter alphabet $\Sigma=\{a_{1},a_{2},\ldots,a_{m}\}$ . A word $u\in\Sigma^{*}$ is a subword of $v\in\Sigma^{*}$ , written $u\preccurlyeq v$ , if there is a factorization $v=v_{0}u_{1}v_{1}u_{2}v_{2}\cdots u_{\ell}v_{\ell}$ with factors $v_{0},\ldots,v_{\ell},u_{1},\ldots,u_{\ell}\in\Sigma^{*}$ and such that $u=u_{1}\cdots u_{\ell}$ [24]. When furthermore $\ell=1$ then $u$ is also a factor of $v$ . For example, both $u_{1}=\mathtt{AA}$ and $u_{2}=\mathtt{BB}$ are subwords of $v=\mathtt{ABBA}$ but only $u_{2}$ is a factor of $v$ . The longest subword of $u$ is $u$ itself, its shortest subword is the empty word, denoted $\epsilon$ . Subwords are the usual concept of subsequences in the special case of words, i.e., finite sequences of letters.

For given $m\in\mathbb{N}$ , we write $v\sim_{m}v^{\prime}$ when $v$ and $v^{\prime}$ have the same subwords of length at most $m$ [27]. For example $\mathtt{ABBA}\sim_{2}\mathtt{BABA}$ while $\mathtt{ABBA}\not\sim_{3}\mathtt{BABA}$ since, e.g., $\mathtt{ABB}\preccurlyeq\mathtt{ABBA}$ and $\mathtt{ABB}\not\preccurlyeq\mathtt{BABA}$ . In situations where $u$ is subword of only one among two words $v$ and $v^{\prime}$ , we say that $u$ is a (piecewise) distinguisher (also a separator) between $v$ and $v^{\prime}$ .

Then $v\sim_{k}v^{\prime}$ means that $v$ and $v^{\prime}$ have no distinguisher of length $k$ or less, or that their shortest distinguisher (guaranteed to exist when $v\neq v^{\prime}$ ) has length at least $k+1$ . Each $\sim_{k}$ is an equivalence relation over $\Sigma^{*}$ , with $\sim_{0}=\Sigma^{*}\times\Sigma^{*}$ being the trivial “always true” relation, and $u\sim_{1}v$ holding if, and only if, $u$ and $v$ use the same subset of letters. Every $\sim_{k}$ has finite index [28, 14] and is a refinement of the previous $\sim_{k-1}$ , i.e., $u\sim_{k}v\implies u\sim_{k-1}v$ , with $\bigcap_{k\in\mathbb{N}}\sim_{k}=\textit{Id}$ . One speaks of “Simon’s congruence” in view of $u\sim_{k}u^{\prime}\land v\sim_{k}v^{\prime}\implies u\,v\sim_{k}u^{\prime}v^{\prime}$ .

The piecewise complexity of a word $u$ , denoted $h(u)$ , is the smallest $k$ such that $u\sim_{k}v\implies u=v$ [15, 26]. In terms of distinguishers, $h(u)=k$ means that any $v\neq u$ can be separated from $u$ by a piecewise distinguisher of length $\leq k$ . We refer to [15, 26] and the references therein for more motivations and applications of piecewise complexity.

2.2 Computing $h(u)$

An $O(|u|\cdot|\Sigma|)$ time algorithm computing $h(u)$ is given in [26]. It is based on a characterisation of $h(u)$ in terms of piecewise-distances. For a word $u\in\Sigma^{*}$ and a letter $a\in\Sigma$ , we follow [27, 28] and define the right and left “side distances”:

\displaystyle r(u,a)

\displaystyle\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\max\{k% \leavevmode\nobreak\ |\leavevmode\nobreak\ u\sim_{k}u\,a\}\>,

\displaystyle\ell(a,u)

\displaystyle\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\max\{k% \leavevmode\nobreak\ |\leavevmode\nobreak\ a\,u\sim_{k}u\}\>.

The two notions are mirror of one another: $\ell(a,u)=r(u^{\mathsf{R}},a)$ , where $u^{\mathsf{R}}$ denotes the reverse, or the mirror image, of $u$ .

One can reduce the computation of $h$ to the computation of $r$ and $\ell$ in view of the following characterisation (see [26, Prop. 3.1]):

h(u)=\max_{\begin{subarray}{c}0\leq i\leq|u|\\ a\in\Sigma\end{subarray}}r(u(,i),a)+\ell(a,u(i,))+1\>.

(1)

In order to compute $r(u,a)$ , Schnoebelen and Vialard introduce the following characterisation:

Lemma 2.1 ([26, Lem. 3.8]).

For any word $u\in\Sigma^{*}$ and any letters $a,b\in\Sigma$ , the following hold:

	$\displaystyle r(\epsilon,a)=$	$\displaystyle 0\>,$		(2)
	$\displaystyle r(ub,a)=$	$\displaystyle\begin{cases}r(u,a)+1&\text{if $a=b,$}\\ \min\bigl{(}r(u,b)+1,r(u,a)\bigr{)}&\text{if $a\neq b.$}\end{cases}$		(3)

Following [26], the $r$ -table of $u$ , denoted $R_{u}$ , is the $m$ -by- $(n+1)$ matrix with entry $r\bigl{(}u(,j),a_{i}\bigr{)}$ in column $0\leq j\leq n$ and row $1\leq i\leq m$ , where we assume some fixed ordering $\{a_{1},\ldots,a_{m}\}$ of the letters in $\Sigma$ . There is a similar $\ell$ -table, denoted $L_{u}$ , with entries $\ell\bigl{(}a_{i},u(j,)\bigr{)}$ . The key point is that, by Eq. (1), computing $h(u)$ amounts to finding the maximum value in the array $R_{u}+L_{u}$ obtained by summing the two matrices.

Example 2.2.

Here are $R_{u}$ and $L_{u}$ for $u=\mathtt{CBBCAC}$ , assuming $\Sigma=\{\mathtt{A},\mathtt{B},\mathtt{C}\}$ :

With Eq. (1), one sees that $h(u)=4$ , obtained as $r(u[,4],\mathtt{C})+\ell(\mathtt{C},u[4,])+1=2+1+1=4$ . This maximum is not uniquely attained: $r(u[,3],\mathtt{C})+\ell(\mathtt{C},u[3,])+1=1+2+1=4$ . Note that $\mathtt{CBBCCAC}\sim_{3}\mathtt{CBBCAC}=u$ , so that indeed $h(u)$ is necessarily larger than 3.

3 Tropical algebra and piecewise complexity

3.1 The tropical semiring

By tropical algebra we mean the semiring $\langle\mathbb{N}\cup\{+\infty\};\min,+\rangle$ , with $+\infty$ as zero and $0$ as unit, also called the $(\min,+)$ -semiring. It is a semiring rather than a ring because there is no additive inverse and no subtraction. In the following we use $\ast$ as a less cumbersome notation for $+\infty$ ,²²2It is common to use $\epsilon$ as a light notation for the zero in tropical rings but we reserve $\epsilon$ to denote the empty word. and $\mathbb{N}_{\ast}$ as shorthand for $\mathbb{N}\cup\{\ast\}$ (and also for the whole structure). We refer to [9, 22, 18] for more in-depth introductions and simply observe that, modulo a sign change, $(\min,+)$ -semirings are isomorphic to the more familiar $(\max,+)$ -semirings one typically encounters in operations research [13], combinatorics [3], economics, etc.

We use the standard linear ordering on $\mathbb{N}_{\ast}$ , given by $0<1<2<\cdots<i<i+1<\cdots<\ast$ . Both semiring operations are monotonic w.r.t. $\leq$ : for any $x,x^{\prime},y,y^{\prime}\in\mathbb{N}_{\ast}$ , $x\leq x^{\prime}$ and $y\leq y^{\prime}$ together imply $\min(x,y)\leq\min(x^{\prime},y^{\prime})$ and $x+y\leq x^{\prime}+y^{\prime}$ . Finally, note that $\ast$ , the zero of $\mathbb{N}_{\ast}$ , is also its maximal element.

Tropical semirings lead to their own flavour of linear algebra. In $\mathbb{N}_{\ast}$ , a linear combination of variables $\bm{x}=(x_{1},\ldots,x_{k})$ has the form

f(x_{1},\ldots,x_{k})=\min_{j=1}^{k}(\alpha_{j}+x_{j})\>,

where the $\alpha_{j}$ ’s are scalars from $\mathbb{N}_{\ast}$ . Such an $f$ can be represented by a line vector $\bigl{(}\alpha_{1}\>\alpha_{2}\>\cdots\>\alpha_{k}\bigr{)}$ , simply written $\bm{\alpha}$ , so that $f(\bm{x})$ is the (tropical) dot product $\bm{\alpha}\cdot\bm{x}$ .

Since $\mathbb{N}_{\ast}$ is only a commutative semiring and not a field, these linear combinations do not form a proper vector space, not even a module, but only a semimodule over $\mathbb{N}_{\ast}$ . However, and for our limited purposes, the usual notions of vector and matrix sums and products work as expected so the reader should be on familiar ground. The next paragraphs introduce the notations we need and collect the main properties we shall rely on.

We will often use the vector $\bm{e}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\bigl{(}0\;0\;\cdots% \;0\bigr{)}$ , made up of units only (not zeroes) and with dimension $m$ to be inferred from context. We write ${\bm{e}}_{i}$ for the vector $\bigl{(}\ast\;\cdots\;\ast\;0\;\ast\;\cdots\;\ast\bigr{)}$ that has a single unit in position $i$ , surrounded by zeroes. Thus ${\bm{e}}_{i}$ is the $i$ th column of the tropical identity matrix $I_{m}$ and $\bm{e}$ is its diagonal. As expected, $({\bm{e}}_{i})_{i=1,\ldots,m}$ is a base for $\mathbb{N}_{\ast}^{m}$ , the free $m$ -dimensional $\mathbb{N}_{\ast}$ -semimodule.

In $\mathbb{N}_{\ast}^{m}$ , a linear application is a mapping $F:\mathbb{N}_{\ast}^{m}\to\mathbb{N}_{\ast}^{n}$ of the form $F(x_{1},\ldots,x_{m})=\bigl{(}f_{1}(x_{1},\ldots,x_{m}),\ldots,f_{n}(x_{1},% \ldots,x_{m})\bigr{)}$ where each $f_{i}$ is a (tropical) linear combination. As in classical linear algebra, $F$ can be represented by a $n$ -by- $m$ matrix $M_{F}$ , where row $i$ contains the scalars defining $f_{i}$ , so that $F(\bm{x})=M_{F}\cdot\bm{x}$ .

The ordering on $\mathbb{N}_{\ast}$ leads to a partial ordering on tropical vectors and matrices. For two vectors $\bm{\alpha},\bm{\beta}\in\mathbb{N}_{\ast}^{m}$ , we write $\bm{\alpha}\leq\bm{\beta}$ when $\alpha_{i}\leq\beta_{i}$ for all $i=1,\ldots,m$ . Similarly, given two tropical matrices $A$ and $B$ of same heights and widths, we write $A\leq B$ when each value $A_{i,j}$ in $A$ is dominated by the corresponding $B_{i,j}$ in $B$ . Then $A\leq B$ and $\bm{\alpha}\leq\bm{\beta}$ together imply $A\cdot\bm{\alpha}\leq B\cdot\bm{\beta}$ , i.e., “linear applications are monotonic”. Note that the tropical sums of vectors $\bm{\alpha},\bm{\beta}$ or matrices $M,M^{\prime}$ , are computed by applying $\min$ component-wise and coincide with infimums, or meets, in the lattices $(\mathbb{N}_{\ast}^{m},\leq)$ and $(\mathbb{N}_{\ast}^{m\times n},\leq)$ : for this reason we denote them with $\bm{\alpha}\sqcap\bm{\beta}$ and $M\sqcap M^{\prime}$ instead of “ $\min(\bm{\alpha},\bm{\beta})$ ” or “ $\min(M,M^{\prime})$ ” which suggests picking the smaller vector or matrix. We still use $\min$ for scalars since, in that case, the two interpretations coincide.

3.2 A tropical view of the $𝒓$ - and $\ell$ -tables

The starting point of this paper is the realisation that Lemma 2.1 actually describes the $j+1$ th column of $R_{u}$ as a tropically linear image of its $j$ th column. For this we associate a tropical matrix with each word over $\Sigma$ .

Definition 3.1.

The extension matrix associated with a letter $a\in\Sigma=\{a_{1},\ldots,a_{m}\}$ is the $m$ -by- $m$ matrix, denoted $M_{a}$ , given by

M_{a}[i,j]=\begin{cases}1&\text{if $a=a_{i}$,}\\ 0&\text{if $i=j$ and $a\neq a_{i}$,}\\ \ast&\text{otherwise.}\end{cases}

(4)

The extension matrix associated with a word $u=u(1)u(2)\cdots u(n)$ over $\Sigma$ is the (tropical) product $M_{u(1)}\cdot M_{u(2)}\cdots M_{u(n)}$ , written $M_{u}$ . In particular $M_{\epsilon}$ is $I_{m}$ , the identity matrix of order $m$ .
Finally, the right-to-left extension matrix $N_{u}$ , is $M_{u^{\mathsf{R}}}$ , i.e., the extension matrix of the mirror word.

Example 3.2.

Assuming $\Sigma=\{\mathtt{A},\mathtt{B},\mathtt{C}\}$ , the above definition leads to

	$\displaystyle M_{\mathtt{A}}=\begin{pmatrix}1&1&1\\ \ast&0&\ast\\ \ast&\ast&0\end{pmatrix}\>,\quad M_{\mathtt{B}}=\begin{pmatrix}0&\ast&\ast\\ 1&1&1\\ \ast&\ast&0\end{pmatrix}\>,\quad M_{\mathtt{C}}=\begin{pmatrix}0&\ast&\ast\\ \ast&0&\ast\\ 1&1&1\end{pmatrix}\>,$
for letters, and
	$\displaystyle M_{\mathtt{CBBCAC}}=\begin{pmatrix}1&1&2\\ 2&2&3\\ 2&2&3\end{pmatrix}\>,\quad N_{\mathtt{CBBCAC}}=\begin{pmatrix}1&3&3\\ 1&2&2\\ 2&3&3\end{pmatrix}\>,$

for a word $u=\mathtt{CBBCAC}$ . Note that $M_{u}$ and $N_{u}$ have no clear relation ( $M_{u}=M_{v}$ does not entail $N_{u}=N_{v}$ ).

We may now return to the table $R_{u}$ . Let us write $C_{u,j}$ for its column number $j$ , recalling that the first column has index 0, and use $C_{u}$ as shorthand for $C_{u,|u|}$ , i.e., the last column. When $u$ is understood, we usually just write $C_{j}$ for $C_{u,j}$ . Finally, we shall mostly display and use the $C_{j}$ ’s as line vectors without bothering to write $C_{j}^{\intercal}$ everywhere.

By definition, $C_{j}$ is $\bigl{(}r(u(,j),a_{1})\>r(u(,j),a_{2})\>\cdots\>r(u(,j),a_{m})\bigr{)}$ . In particular $C_{0}=\bm{e}$ .

Lemma 3.3.

For any word $u$ and any factor $u(j,j^{\prime})$ of $u$ ,

C_{j^{\prime}}=C_{j}\cdot M_{u(j,j^{\prime})}\>.

In particular, $C_{u}=\bm{e}\cdot M_{u}$ .

Proof.

By induction on $j^{\prime}-j$ , the length of the factor of $u$ . The base case $j^{\prime}=j$ is clear since $M_{u(j,j)}$ , i.e., $M_{\epsilon}$ , is the identity matrix. If $j^{\prime}=j+1$ then $u(j,j^{\prime})$ is the letter $u(j^{\prime})=a_{i}$ for some $1\leq i\leq m$ . By definition we have that for all $1\leq k\leq m$ , the $k$ th entry of column $j$ is $(C_{j})_{k}=r(u(,j),a_{k})$ and likewise, the $k$ th entry of column $j+1$ is $(C_{j+1})_{k}=r(u(,j+1),a_{k})$ . Therefore

	$\displaystyle(C_{j+1})_{k}\overset{(\ref{eq-r-rule1})}{=}$	$\displaystyle\begin{cases}r(u(,j),a_{i})+1=(C_{j})_{i}+1&\text{if $i=k,$}\\ \min\bigl{(}r(u(,j),a_{i})+1,r(u(,j),a_{k})\bigr{)}=\min\bigl{(}(C_{j})_{i}+1,% (C_{j})_{k}\bigr{)}&\text{if $i\neq k.$}\end{cases}$
	$\displaystyle=$	$\displaystyle\,C_{j}\cdot M_{a_{i}}\;.$

Finally, if $j^{\prime}>j+1$ , then for any $j^{\prime\prime}$ strictly between $j$ and $j^{\prime}$ we can derive

C_{j}\cdot M_{u(j,j^{\prime})}=C_{j}\cdot M_{u(j,j^{\prime\prime})\cdot u(j^{% \prime\prime},j^{\prime})}\stackrel{{\scriptstyle\!\text{Df.\leavevmode% \nobreak\ \ref{def-ext-mat}}\!}}{{=}}C_{j}\cdot M_{u(j,j^{\prime\prime})}\cdot M% _{u(j^{\prime\prime},j^{\prime})}\stackrel{{\scriptstyle\!\text{i.h.}\!}}{{=}}% C_{j^{\prime\prime}}\cdot M_{u(j^{\prime\prime},j^{\prime})}\stackrel{{% \scriptstyle\!\text{i.h.}\!}}{{=}}C_{j^{\prime}}\>.

using the induction hypothesis. $\hfill\blacktriangleleft$

Mirror results hold for $L_{u}$ once we introduce the required notations. So let us write $D_{u,j}$ for the $j$ th column of $L_{u}$ (counting from left and starting with $j=0$ ) with standard abbreviations $D_{j}$ for $D_{u,j}$ when $u$ is understood, and $D_{u}$ for $D_{u,0}$ . Then for any factor $u(j,j^{\prime})$ of $u$ we have $D_{j}=D_{j^{\prime}}\cdot N_{u(j,j^{\prime})}$ , and in particular $D_{u}=\bm{e}\cdot N_{u}$ .

3.3 Compositional computation of $R_{u}$

The key to computing $r$ - and $\ell$ -tables compositionally is to consider $R_{vu}$ and $L_{uv}$ as functions of $R_{v}$ and $L_{v}$ .

Definition 3.4 (f- $r$ -table).

The functional $r$ -table, denoted $F_{u}$ , associated with a word $u\in\Sigma^{*}$ is a $m$ -by- $(n+1)$ matrix with value

\bm{\delta}_{i,j}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}M_{u(,j)}% \cdot\bm{e}_{i}

(5)

in position $(i,j)$ .

In other words, $\bm{\delta}_{i,j}$ is the $i$ th column of the extension matrix for the $j$ th prefix of $u$ . Note that $F_{u}$ is a table of vectors where $R_{u}$ was a table of values.

Now $F_{u}$ can be used to compute $R_{u}$ via $r\bigl{(}u(,j),a_{i}\bigr{)}=\bm{e}\cdot\bm{\delta}_{i,j}$ (Lemma 3.3), but more generally it provides the $r(w,a_{i})$ ’s for any word $w$ that has some $u(,j)$ as a suffix. Indeed, assume $w=v\,u(,j)$ :

Lemma 3.5.

For any words $u,v\in\Sigma^{*}$ , letter $a_{i}\in\Sigma$ , and position $0\leq j\leq|u|$ :

r\bigl{(}v\,u(,j),a_{i}\bigr{)}=\bm{\delta}_{i,j}\cdot C_{v}\>.

(6)

Proof.

On one hand, $r\bigl{(}v\,u(,j),a_{i}\bigr{)}$ is the $i$ -value in the last column of $R_{v\,u(,j)}$ , i.e., is $C_{v\,u(,j)}\cdot\bm{e}_{i}$ . Now

C_{v\,u(,j)}=\bm{e}\cdot M_{v}\cdot M_{u(,j)}=C_{v}\cdot M_{u(,j)}\>.

We conclude since $\bm{\delta}_{i,j}$ is just $M_{u(,j)}\cdot\bm{e}_{i}$ . $\hfill\blacktriangleleft$

Mirror notions and results exist for $\ell$ . The f- $\ell$ -table associated with $u$ is a $m$ -by- $(n+1)$ matrix, denoted $G_{u}$ , and having value

\bm{\gamma}_{i,j}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}N_{u(j,)}% \cdot\bm{e}_{i}

(7)

in position $(i,j)$ , i.e., $\bm{\gamma}_{i,j}$ is the $i$ th column of $N_{u(j,)}$ . This ensures

\ell\bigl{(}a_{i},u(j,)v\bigr{)}=\bm{\gamma}_{i,j}\cdot D_{v}\;.

(8)

Example 3.6.

Taking $u=\mathtt{CBBCAC}$ as in Example 2.2, $F_{u}$ and $G_{u}$ are

\displaystyle\begin{vmatrix}\!\left(\begin{smallmatrix}0\\ \ast\\ \ast\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ \ast\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ 1\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ 1\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ 1\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 2\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 2\\ 2\end{smallmatrix}\right)\!\\[10.00002pt] \!\left(\begin{smallmatrix}\ast\\ 0\\ \ast\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 0\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 2\\ 3\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 2\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 2\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 2\\ 2\end{smallmatrix}\right)\!\\[10.00002pt] \!\left(\begin{smallmatrix}\ast\\ \ast\\ 0\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ \ast\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 1\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 1\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 2\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 2\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}2\\ 3\\ 3\end{smallmatrix}\right)\!\end{vmatrix}\text{ and }\begin{vmatrix}\!\left(% \begin{smallmatrix}1\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ \ast\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ \ast\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ \ast\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}0\\ \ast\\ \ast\end{smallmatrix}\right)\!\\[10.00002pt] \!\left(\begin{smallmatrix}3\\ 2\\ 3\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}3\\ 2\\ 3\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}2\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 0\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ 0\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 0\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ 0\\ \ast\end{smallmatrix}\right)\!\\[10.00002pt] \!\left(\begin{smallmatrix}3\\ 2\\ 3\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}2\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}2\\ 1\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}2\\ \ast\\ 2\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}1\\ \ast\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ \ast\\ 1\end{smallmatrix}\right)\!&\!\!\left(\begin{smallmatrix}\ast\\ \ast\\ 0\end{smallmatrix}\right)\!\end{vmatrix}.

3.4 Summing $r-$ and $\ell$ -tables

In order to compute the $r(u(,i),a)+\ell(a,u(i,))$ required by Equation 1, we now want to combine the two tables, as done in Example 2.2. Recall, however, that these $+$ are actually products in $\mathbb{N}_{\ast}$ , and that the vectors $\bm{\delta}_{i,j}$ and $\bm{\gamma}_{i,j}$ actually encode the linear maps given by $C\mapsto\bm{\delta}_{i,j}\cdot C$ and $D\mapsto\bm{\gamma}_{i,j}\cdot D$ in Eqs. (6) and (8).

Given two tropically linear functions $f(\bm{x})=\bm{\alpha}\cdot\bm{x}$ and $g(\bm{y})=\bm{\beta}\cdot\bm{y}$ from $\mathbb{N}_{\ast}^{m}$ to $\mathbb{N}_{\ast}$ , the tropical product $f\odot g$ , defined via $(f\odot g)(\bm{x},\bm{y})\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}f% (\bm{x})+g(\bm{y})$ , satisfies

	$\displaystyle(f\odot g)(\bm{x},\bm{y})=\bigl{(}\bm{\alpha}\cdot\bm{x}\bigr{)}+% \bigl{(}\bm{\beta}\cdot\bm{y}\bigr{)}$	$\displaystyle=\bigl{(}\min_{1\leq i\leq m}\alpha_{i}+x_{i}\bigr{)}+\bigl{(}% \min_{1\leq j\leq m}\beta_{j}+y_{j}\bigr{)}$
		$\displaystyle=\min_{1\leq i,j\leq m}(\alpha_{i}+\beta_{j}+x_{i}+y_{j})\>.$

It is (tropically) bilinear since $(f\odot g)(\bm{x}\sqcap\bm{x^{\prime}},\bm{y})=(f\odot g)(\bm{x},\bm{y})\sqcap% (f\odot g)(\bm{x^{\prime}},\bm{y})$ and, for a scalar $\alpha\in\mathbb{N}_{\ast}$ , $(f\odot g)(\bm{x},\alpha\cdot\bm{y})=\alpha+(f\odot g)(\bm{x},\bm{y})$ .³³3Here $\alpha\cdot\bm{y}$ is a tropical scaling of a vector, which amounts to increase every value in $\bm{y}$ by $\alpha$ . Such mappings are the natural objects of multilinear algebra, see e.g. [19]. Moving to the tensor space $\mathbb{N}_{\ast}^{m}\otimes\mathbb{N}_{\ast}^{m}$ , we can see $f\odot g$ , a mapping from $\mathbb{N}_{\ast}^{2m}$ to $\mathbb{N}_{\ast}$ , as a tropically linear mapping from $\mathbb{N}_{\ast}^{m^{2}}$ to $\mathbb{N}_{\ast}$ , denoted $f\otimes g$ , and that associates $\bigl{(}\bm{\alpha}\otimes\bm{\beta}\bigr{)}\cdot\bigl{(}\bm{x}\otimes\bm{y}% \bigr{)}$ with the tensor $\bm{x}\otimes\bm{y}$ .

Adapted to tropical algebra, the elementary tensor $\bm{\kappa}=\bm{\alpha}\otimes\bm{\beta}$ obtained as outer product of two vectors of size $m$ and $n$ , respectively, is the $m$ -by- $n$ matrix given by $\kappa_{i,j}=\alpha_{i}+\beta_{j}$ . It is natural to see $\bm{\kappa}$ as a matrix but sometimes, as in the expression $\bigl{(}\bm{\alpha}\otimes\bm{\beta}\bigr{)}\cdot\bigl{(}\bm{x}\otimes\bm{y}% \bigr{)}$ , it is more natural to see the tensors as vectors of size $n m$ .⁴⁴4The size- $n m$ vector is obtained by reading the matrix column by colum, see Example 3.9 below.

Definition 3.7 (Summing $F_{u}$ and $G_{u}$ ).

The functional $r+\ell$ -table associated with a word $u\in\Sigma^{*}$ is the $m$ -by- $(n+1)$ matrix, denoted $H_{u}$ , having $\bm{\eta}_{i,j}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\bm{\delta}% _{i,j}\otimes\bm{\gamma}_{i,j}$ in position $(i,j)$ .

In the above definition, $\bm{\delta}_{i,j}$ and $\bm{\gamma}_{i,j}$ are as in the definition of $F_{u}$ and $G_{u}$ . Finally, every $\bm{\eta}_{i,j}$ entry in $H_{u}$ is a $m$ -by- $m$ tropical tensor, or, equivalently, a tropical vector of size $m^{2}$ .

Let us now consider a word $w$ that contains $u$ as a factor, i.e., assume that $w$ is some $v_{1}u\,v_{2}$ . For a position $j$ in $u$ and a letter $a_{i}$ in $\Sigma$ , one obtains

r\bigl{(}v_{1}u(,j),a_{i}\bigr{)}+\ell\bigl{(}a_{i},u(j,)v_{2}\bigr{)}=\bm{% \eta}_{i,j}\cdot\bigl{(}C_{v_{1}}\otimes D_{v_{2}}\bigr{)}\>,

(9)

by combining Eqs. (6) and (8) with the definition of $\bm{\eta}_{i,j}$ .

If now one wants to find the maximum value among all $r\bigl{(}v_{1}u(,j),a_{i}\bigr{)}+\ell\bigl{(}a_{i},u(j,)v_{2}\bigr{)}$ , it is not necessary to consider all possibilities for $j=0,\ldots,|u|$ and $i=1,\ldots,m$ . Since $C_{v_{1}}\otimes D_{v_{2}}$ does not change with $i$ and $j$ , and since the dot-product in Eq. (9) is monotonic with respect to the component-wise order, we do not need the full $H_{u}$ table to compute $h(u)$ but only a set of its maximal elements. This leads to:

Definition 3.8 (Piecewise signatures).

Assume a fixed alphabet $\Sigma=\{a_{1},\ldots,a_{m}\}$ .

$\blacksquare$

The residual tensors associated with a word $u\in\Sigma^{*}$ is the set $E_{u}=\{\bm{\eta},\bm{\eta^{\prime}},\ldots\}$ of all maximal elements in $H_{u}$ .
$\blacksquare$

The (piecewise) signature of $u$ is the triple $\sigma(u)\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}(E_{u},M_{u},N_{u})$ collecting $E_{u}$ and the two extension matrices.

Note that $E_{u}$ is never empty and usually contains several elements since the ordering on $\mathbb{N}_{\ast}^{m^{2}}$ is only a partial ordering when $m>1$ .

Example 3.9.

We continue with $u=\mathtt{CBBCAC}$ . Here $H_{u}$ contains $3\times 7=21$ tensors obtained as the pairwise products $\bm{\delta}_{i,j}\otimes\bm{\gamma}_{i,j}$ of the vectors from $F_{u}$ and $G_{u}$ listed earlier, see Example 3.6. It has seven (pairwise incomparable) maximal elements. They are

	$\displaystyle\bm{\eta}_{\mathtt{A},0}$	$\displaystyle=\begin{pmatrix}0&\ast&\ast\end{pmatrix}\otimes\begin{pmatrix}1&1% &2\end{pmatrix}=\begin{pmatrix}1&1&2&\ast&\ast&\ast&\ast&\ast&\ast\end{pmatrix% }\>,$
	$\displaystyle\bm{\eta}_{\mathtt{B},0}$	$\displaystyle=\begin{pmatrix}\ast&0&\ast\end{pmatrix}\otimes\begin{pmatrix}3&2% &3\end{pmatrix}=\begin{pmatrix}\ast&\ast&\ast&3&2&3&\ast&\ast&\ast\end{pmatrix% }\>,$
	$\displaystyle\bm{\eta}_{\mathtt{C},0}$	$\displaystyle=\begin{pmatrix}\ast&\ast&0\end{pmatrix}\otimes\begin{pmatrix}3&2% &3\end{pmatrix}=\begin{pmatrix}\ast&\ast&\ast&\ast&\ast&\ast&3&2&3\end{pmatrix% }\>,$
	$\displaystyle\bm{\eta}_{\mathtt{C},3}$	$\displaystyle=\begin{pmatrix}\ast&1&1\end{pmatrix}\otimes\begin{pmatrix}2&\ast% &2\end{pmatrix}=\begin{pmatrix}\ast&\ast&\ast&3&\ast&3&3&\ast&3\end{pmatrix}\>,$
	$\displaystyle\bm{\eta}_{\mathtt{C},5}$	$\displaystyle=\begin{pmatrix}1&2&2\end{pmatrix}\otimes\begin{pmatrix}\ast&\ast% &1\end{pmatrix}=\begin{pmatrix}\ast&\ast&2&\ast&\ast&3&\ast&\ast&3\end{pmatrix% }\>,$
	$\displaystyle\bm{\eta}_{\mathtt{A},6}$	$\displaystyle=\begin{pmatrix}1&2&2\end{pmatrix}\otimes\begin{pmatrix}0&\ast&% \ast\end{pmatrix}=\begin{pmatrix}1&\ast&\ast&2&\ast&\ast&2&\ast&\ast\end{% pmatrix}\>,$
	$\displaystyle\bm{\eta}_{\mathtt{B},6}$	$\displaystyle=\begin{pmatrix}1&2&2\end{pmatrix}\otimes\begin{pmatrix}\ast&0&% \ast\end{pmatrix}=\begin{pmatrix}\ast&1&\ast&\ast&2&\ast&\ast&2&\ast\end{% pmatrix}\>.$

Note that $\bm{\eta}_{\mathtt{C},5}$ coincides with $\bm{\eta}_{\mathtt{C},6}$ , obtained as $\begin{pmatrix}2&3&3\end{pmatrix}\otimes\begin{pmatrix}\ast&\ast&0\end{pmatrix}$ : however $E_{u}$ only contains one copy of each maximal tensor. Similarly one has $\bm{\eta}_{\mathtt{C},0}=\bm{\eta}_{\mathtt{C},1}$ and $\bm{\eta}_{\mathtt{C},4}=\bm{\eta}_{\mathtt{C},5}$ .

The following lemma, needed in Section 4, can be seen as another example.

Lemma 3.10.

For a letter $a\in\Sigma$ where $|\Sigma|=m$ , $E_{a}$ contains $2m-1$ maximal tensors.

Proof.

Assume $a=a_{s}$ . There are $2m$ vectors $\bm{\delta}_{i,j}$ in $F_{a}$ , with $1\leq i\leq m$ and $0\leq j\leq 1$ . The table $G_{a}$ is the mirror image, i.e., $\bm{\gamma}_{i,j}=\bm{\delta}_{i,1-j}$ . The $2m$ tensors in $H_{a}$ are all incomparable except for $\bm{\delta}_{s,0}\otimes\bm{\gamma}_{s,0}=\bm{\delta}_{s,1}\otimes\bm{\gamma}_% {s,1}$ , hence the claim. $\hfill\blacktriangleleft$

The signature of a word $u$ , and more precisely the tensors in $E_{u}$ , provides enough information to compute $h(u)$ :

Theorem 3.11 (Piecewise complexity from signatures).

For any word $u\in\Sigma^{*}$

h(u)=1+\max_{\bm{\eta}\in E_{u}}\bm{\eta}\cdot\bm{e}\>.

(10)

Proof.

The proof is simple. Starting with Eq. (1), one has

	$\displaystyle h(u)$	$\displaystyle=\max_{\begin{subarray}{c}u=u_{1}u_{2}\\ a_{i}\in A\end{subarray}}1+r(u_{1},a_{i})+\ell(a_{i},u_{2})=1+\max_{\begin{% subarray}{c}0\leq j\leq\|u\|\\ a_{i}\in A\end{subarray}}r\bigl{(}u(,j),a_{i}\bigr{)}+\ell\bigl{(}a_{i},u(j,)% \bigr{)}+1$
		$\displaystyle=1+\max_{\begin{subarray}{c}0\leq j\leq\|u\|\\ a_{i}\in\Sigma\end{subarray}}\bm{\eta}_{i,j}\cdot\bigl{(}C_{\epsilon}\otimes D% _{\epsilon}\bigr{)}=1+\max_{\begin{subarray}{c}0\leq j\leq\|u\|\\ a_{i}\in\Sigma\end{subarray}}\bm{\eta}_{i,j}\cdot\bigl{(}\bm{e}\otimes\bm{e}% \bigr{)}\;,$
using Eq. (9) with $v_{1},v_{2}=\epsilon$ ,
		$\displaystyle=1+\max_{\begin{subarray}{c}0\leq j\leq\|u\|\\ a_{i}\in\Sigma\end{subarray}}\bm{\eta}_{i,j}\cdot\bm{e}\>,$
where it is understood that now $\bm{e}$ denotes a size- $m^{2}$ tensor,
		$\displaystyle=1+\max_{\bm{\eta}\in E_{u}}\bm{\eta}\cdot\bm{e}\>,$

since dot-product is monotonic w.r.t. $\leq$ . $\hfill\blacktriangleleft$ Note that, for $\bm{\eta}$ an $m^{2}$ -sized vector, the product $\bm{\eta}\cdot\bm{e}$ in Eq. (10) is just the minimal entry in $\bm{\eta}$ . However the notation $\bm{\eta}\cdot\bm{e}$ is quite convenient, and we shall extend the above reasoning with tensors different from $\bm{e}$ .

Example 3.12.

Continuing with $u=\mathtt{CBBCAC}$ , the tensor in $E_{u}$ that has the largest minimal component is $\bm{\eta}_{\mathtt{C},3}$ (see Example 3.9), yielding $\bm{\eta}_{\mathtt{C},3}\cdot\bm{e}=3$ and $h(u)=4$ , in agreement with Example 2.2.

For complexity analysis we shall assume that signatures are simply represented as two $m$ -by- $m$ matrices together with a collection of $m^{2}$ -sized residual tensors. The values contained in theses matrices and tensors are usually encoded using a logarithmic number of bits (e.g., they are written in binary).

Theorem 3.13 (Space complexity of piecewise signatures).

The size $|\sigma(u)|$ of the signature of a word $u\in\Sigma^{*}$ is in $\log(n)\cdot m^{O(1)}$ where $n=|u|$ and $m=|\Sigma|$ .

Proof.

Let us write $\#E_{u}$ for the cardinal of $E_{u}$ . Now $\sigma(u)$ contains two matrices and $\#E_{u}$ tensors, each made up of $m^{2}$ scalars. These scalars are bounded by $|u|$ (or are $\ast$ ) so only use $O(\log(n))$ space. It is therefore enough to show that $\#E_{u}$ is bounded by a polynomial of $m$ . This is done in Section 5 where we prove $\#E_{u}\leq 2m(2m+1)$ (see Theorem 5.1). $\hfill\blacktriangleleft$

3.5 Combining piecewise signatures

Since the objects in signatures can accommodate concatenation, it is possible to compute them compositionally. The main result of this section is stated as

Theorem 3.14 (Compositionality).

The signature $\sigma(u\,v)$ of the concatenation of two words $u,v\in\Sigma^{*}$ can be computed from $\sigma(u)$ and $\sigma(v)$ .
Furthermore the computation of $\sigma(uv)$ can be done in time polynomial in $|\sigma(u)|+|\sigma(v)|+|\Sigma|$ .

The proof is developed in several steps.

Lemma 3.15.

Let $w=u\,v$ . For $0\leq j\leq|u|$ , the tensor $(H_{w})_{i,j}$ at position $(i,j)$ in $H_{w}$ is $(I_{m}\otimes N_{v})\cdot(H_{u})_{i,j}$ Similarly, for $0\leq j\leq|v|$ , the tensor $(H_{w})_{i,|u|+j}$ in $H_{w}$ is $(M_{u}\otimes I_{m})\cdot(H_{v})_{i,j}$ .

The above statement uses tensor products of linear maps. Recall that when $f,g:\mathbb{N}_{\ast}^{n}\to\mathbb{N}_{\ast}^{m}$ are two linear maps, their tensor product $f\otimes g:\mathbb{N}_{\ast}^{n^{2}}\to\mathbb{N}_{\ast}^{m^{2}}$ can be defined via $(f\otimes g)(\mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$\displaystyle% \bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}}{\rotatebox[% origin={c}]{180.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$% \scriptscriptstyle\bigsqcup$}}}_{\bm{i}}\bm{\alpha_{i}}\otimes\bm{\beta_{i}})=% \mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$\displaystyle\bigsqcup$}}{% \rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}}{\rotatebox[origin={c}]{1% 80.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$% \scriptscriptstyle\bigsqcup$}}}_{\bm{i}}f(\bm{\alpha_{i}})\otimes g(\bm{\beta_% {i}})$ . When $f$ and $g$ are given by two $n$ -by- $m$ matrices $A$ and $B$ , one can obtain a $n^{2}$ -by- $m^{2}$ matrix, denoted $A\otimes B$ , for $f\otimes g$ by taking the products of the rows of $A$ and the rows of $B$ (as expected since $(\bm{e}_{i}\otimes\bm{e}_{j})_{1\leq i,j\leq n}$ is a base of $\mathbb{N}_{\ast}^{n}\otimes\mathbb{N}_{\ast}^{n}$ ).

Proof (of Lemma 3.15).

We only prove the first claim since the second can be obtained directly by mirroring. Assume $0\leq j\leq|u|$ . By definition, $(H_{w})_{i,j}$ is $(F_{w})_{i,j}\otimes(G_{w})_{i,j}$ . Now, by Eq. (5),

	$\displaystyle(F_{w})_{i,j}$	$\displaystyle=M_{w(,j)}\cdot\bm{e}_{i}=M_{u(,j)}\cdot\bm{e}_{i}=(F_{u})_{i,j}\>,$
while, by Eq. (7),
	$\displaystyle(G_{w})_{i,j}$	$\displaystyle=N_{w(j,)}\cdot\bm{e}_{i}=N_{u(j,)v}\cdot\bm{e}_{i}=N_{v}\cdot N_% {u(j,)}\cdot\bm{e}_{i}=N_{v}\cdot(G_{u})_{i,j}\>.$
Thus
	$\displaystyle(H_{w})_{i,j}$	$\displaystyle=(F_{u})_{i,j}\otimes\bigl{(}N_{v}\cdot(G_{u})_{i,j}\bigr{)}=(I_{% m}\otimes N_{v})\cdot\bigl{(}(F_{u})_{i,j}\otimes(G_{u})_{i,j}\bigr{)}=(I_{m}% \otimes N_{v})\cdot(H_{u})_{i,j}$

as claimed. $\hfill\blacktriangleleft$

Lemma 3.16.

Let $w=u\,v$ . The tensors in $E_{w}$ are exactly the maximal tensors in $E^{\prime}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\bigl{\{}(I_{m}% \otimes N_{v})\cdot\bm{\eta}\>\big{|}\>\bm{\eta}\in E_{u}\bigr{\}}\cup\bigl{\{% }(M_{u}\otimes I_{m})\cdot\bm{\eta^{\prime}}\>\big{|}\>\bm{\eta^{\prime}}\in E% _{v}\bigr{\}}$ .

Proof.

First we claim that $E_{w}\subseteq E^{\prime}$ . Indeed, consider $(H_{w})_{i,j}\in E_{w}$ . We shall assume $0\leq j\leq|u|$ since the other case is symmetric. By Lemma 3.15, $(H_{w})_{i,j}$ is $(I_{m}\otimes N_{v})\cdot(H_{u})_{i,j}$ . If $(H_{u})_{i,j}\in E_{u}$ we have proved $(H_{w})_{i,j}\in E^{\prime}$ . If, on the other hand, $(H_{u})_{i,j}\not\in E_{u}$ , then there is some maximal $(H_{u})_{i^{\prime},j^{\prime}}\in E_{u}$ that strictly dominates $(H_{u})_{i,j}$ . Since linear applications are monotonic we deduce

(H_{w})_{i^{\prime},j^{\prime}}=(I_{m}\otimes N_{v})\cdot(H_{u})_{i^{\prime},j% ^{\prime}}\geq(I_{m}\otimes N_{v})\cdot(H_{u})_{i,j}=(H_{w})_{i,j}\>.

Thus, and since $(H_{w})_{i,j}$ is a maximal element of $H_{w}$ , necessarily $(H_{w})_{i,j}$ and $(H_{w})_{i^{\prime},j^{\prime}}$ coincide, so $(H_{w})_{i,j}$ is indeed the image by $I_{m}\otimes N_{v}$ of a tensor from $E_{u}$ , hence is in $E^{\prime}$ .

Now, Lemma 3.15 entails that $E^{\prime}$ is included in $H_{w}$ . So $E_{w}=\max H_{w}\subseteq E^{\prime}\subseteq H_{w}$ . We conclude that $E_{w}$ and $\max E^{\prime}$ coincide. $\hfill\blacktriangleleft$

Proof of Theorem 3.14.

With $\sigma(u)=(E_{u},M_{u},N_{u})$ and $\sigma(v)=(E_{v},M_{v},N_{v})$ one obtains $\sigma(w)$ as follows:
– $M_{w}$ and $N_{w}$ are matrix products $M_{u}\cdot M_{v}$ and $N_{v}\cdot N_{u}$ ;
– $E_{w}$ is obtained by computing $E^{\prime}$ (from Lemma 3.16) and removing the non maximal elements.
All these operations are clearly polynomial-time. $\hfill\blacktriangleleft$

4 Piecewise complexity of SLP-compressed words

An SLP is an acyclic context-free grammar where each non-terminal has only one production rule, i.e., the grammar is deterministic and generates a single word. When a long word has many repetitions it can often be described by an SLP of smaller size and thus SLPs can be used as a compressed data structure for words. Importantly, most compression schemes used for texts and files are equivalent, via polynomial-time encodings, to the mathematically more elegant SLPs. We refer to [17] for more background and details.

Formally, an SLP $X$ with $k$ rules is an alphabet $\Sigma$ together with a list $\langle N_{1}\to\rho_{1};\cdots;N_{k}\to\rho_{k}\rangle$ of production rules where each right-hand side $\rho_{i}$ is either a letter $a$ from $\Sigma$ or a concatenation $N_{j}\,N_{j^{\prime}}$ of two nonterminals with $j,j^{\prime}<i$ . The size of an SLP, denoted $|X$ |, is in $O(m+k\log k)$ , where $m=|\Sigma|$ .

In the context of $X$ , the expansion of a nonterminal $N_{i}$ is a word $\operatorname{\textit{exp}}(N_{i})\in\Sigma^{*}$ defined inductively via:

\operatorname{\textit{exp}}(N_{i})\stackrel{{\scriptstyle\mbox{\scriptsize def% }}}{{=}}\begin{cases}a&\text{if $\rho_{i}=a$,}\\ \operatorname{\textit{exp}}(N_{j})\operatorname{\textit{exp}}(N_{j^{\prime}})&% \text{if $\rho_{i}=N_{j}\,N_{j^{\prime}}$.}\end{cases}

Finally, the expansion of the SLP itself, written $\operatorname{\textit{exp}}(X)$ , is the expansion $\operatorname{\textit{exp}}(N_{k})$ of its last nonterminal. Note that the length of $\operatorname{\textit{exp}}(X)$ can reach $2^{k}-1$ , hence in $2^{O(|X|)}$ .

Theorem 4.1 (Piecewise signatures of compressed words).

Given an SLP $X$ , one can compute $\sigma(\operatorname{\textit{exp}}(X))$ in time $k^{2}\cdot m^{O(1)}$ where $k$ is the number of rules of $X$ and $m$ is the size of its alphabet.

Proof.

The algorithm computes piecewise signatures $\sigma_{1},\ldots,\sigma_{k}$ for each non-terminal of $X$ , i.e., with $\sigma_{i}=\sigma(\operatorname{\textit{exp}}(N_{i}))$ . There are two cases to handle:

1.

when $N_{i}$ is given by a rule $N_{i}\to a$ , the signature of $a$ has extension matrices given by Eq. (4) and $E_{a}$ given by Lemma 3.10;
2.

when $N_{i}$ is given by a rule $N_{i}\to N_{j}\,N_{j^{\prime}}$ , the signature $\sigma_{i}$ is obtained by combining $\sigma_{j}$ and $\sigma_{j^{\prime}}$ as shown in Theorem 3.14.

These $k$ steps can each be performed in time $k\cdot m^{O(1)}$ since, by Theorem 3.13, each $\sigma_{i}$ has size bounded by $\log(|\operatorname{\textit{exp}}(N_{i})|)m^{O(1)}$ , hence in $k\cdot m^{O(1)}$ . $\hfill\blacktriangleleft$ Combining with Theorem 3.11 we obtain:

Corollary 4.2.

The piecewise complexity $h(X)$ of (the word represented by) an SLP $X$ can be computed in time $k^{2}\cdot m^{O(1)}$ .

5 The size of piecewise signatures

Fix some alphabet $\Sigma=\{a_{1},\dots,a_{m}\}$ . The goal of this section is to bound the number of tensors, written $\#E_{u}$ , in any $E_{u}$ set. Such bounds are crucial for the complexity results in Theorems 3.13, 3.14, and 4.1. We first show that $\#E_{u}$ is bounded in $O(m^{2})$ . This is later complemented with Theorem 5.5 where we exhibit a family of witnesses proving an $\Omega(m^{2})$ lower bound.

Theorem 5.1.

For any $u\in\Sigma^{*}$ , $\#E_{u}\leq 2m(2m+1)$ where $m=|\Sigma|$ .

The proof of Theorem 5.1 is organized in three main lemmas.

For any $1\leq j\leq m$ , for any $u\in\Sigma^{*}$ , let $M_{u}[\cdot,j]\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}M_{u}\cdot{% \bm{e}}_{j}$ denote the $j$ th column of $M_{u}$ . The columns of $M_{u}$ respect some patterns that are easier to analyze if we assume that the letters of $\Sigma$ are listed in the order they appear in $u$ (with letters not in $u$ listed at the end). This is no loss of generality since a permutation of $\Sigma$ just amounts to a permutation of the indices in our matrices. Wherever we make that assumption, we say that $u$ and $\Sigma$ are aligned, or sometimes more succinctly that “ $u$ is aligned”.

Lemma 5.2.

Assume that $u$ is aligned and only uses letters $a_{1},\ldots,a_{p}$ with $p\leq m$ . Then every column $M_{u}[\cdot,j]$ agrees with one of the following two patterns:

1.

If $1\leq j\leq p$ , then

$M_{u}[\cdot,j]=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-27-48.pdf2svg.svg}\end{matrix}$ $=T(n,i,p)$

for some $1\leq n$ and some $0\leq i<p$ . We write $T(n,i,p)$ for this vector.
2.

If $p<j\leq m$ then

$M_{u}[\cdot,j]=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-27-57.pdf2svg.svg}\end{matrix}$ $=U(j,p)$

and we write $U(j,p)$ for this vector.

Proof.

By induction on the length of $u$ . For $|u|=0$ we have $u=\epsilon$ , $p=0$ , $M_{u}=I_{m}$ , so $p<j$ and indeed $M_{u}[\cdot,j]={\bm{e}}_{j}=U(j,0)=U(j,p)$ for all $j=1,\ldots,k$ .

Now assume that $u=v\,a_{q}$ is aligned and that the claim holds for $v$ . First let us see how the columns of $M_{u}$ are obtained from the columns of $M_{v}$ : for any $j$ we have $M_{u}[\cdot,j]=(M_{v}\cdot M_{a_{q}})[\cdot,j]=M_{v}\cdot(M_{a_{q}}[\cdot,j])$ . With Eq. (4) we see

M_{a_{q}}[\cdot,j]=\begin{cases}1\cdot{\bm{e}}_{q}=1\cdot{\bm{e}}_{j}&\text{ % if $j=q$, and}\\ (1\cdot{\bm{e}}_{q})\sqcap{\bm{e}}_{j}&\text{ otherwise.}\end{cases}

(

\dagger

)

Note that the terms in ( $\dagger$ ‣ 5) involve a tropical scaling $1\cdot{\bm{e}}_{q}$ which means “increase every value in ${\bm{e}}_{q}$ by 1”, while the second term contains a tropical sum “ $\_\sqcap\_$ ” which means “collect $\min$ s componentwise”. Finally $M_{u}[\cdot,j]$ is $1\cdot M_{v}\cdot{\bm{e}}_{q}=1\cdot M_{v}[\cdot,q]$ if $j=q$ , and is $(1\cdot M_{v}[\cdot,q])\sqcap M_{v}[\cdot,j]$ , simply written ${\mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$\displaystyle\bigsqcup$}}{% \rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}}{\rotatebox[origin={c}]{1% 80.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$% \scriptscriptstyle\bigsqcup$}}}}$ , otherwise.

We consider whether $a_{q}$ occurs in $v$ :

If $a_{q}$ occurs in $v$ then $M_{v}[\cdot,q]=T(n,i,p)$ for some $1\leq n$ , $0\leq i<p$ , hence $M_{u}[\cdot,q]=1\cdot T(n,i,p)=T(n+1,i,p)$ . Now for $j\neq q$ , $M_{v}[\cdot,j]$ is of the form $T(n^{\prime},i^{\prime},p)$ for some $1\leq n^{\prime},0\leq i^{\prime}<p$ if $j\leq p$ , or $U(j,p)$ if $j>p$ . If $M_{v}[\cdot,j]=T(n^{\prime},i^{\prime},p)$ then $M_{u}[\cdot,j]={\mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$% \displaystyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}% }{\rotatebox[origin={c}]{180.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c% }]{180.0}{$\scriptscriptstyle\bigsqcup$}}}}=T(\min(n+1,n^{\prime}),i^{\prime% \prime},p)$ with $i^{\prime\prime}$ being $i$ or $i^{\prime}$ . If $M_{v}[\cdot,j]=U(j,p)$ then observe that $T(n+1,i,p)\sqcap U(j,p)=U(j,p)$ .

Similarly, if $a_{q}$ does not occur in $v$ , (in which case $q=p$ by alignment assumption) then $M_{v}[\cdot,q]=U(p,p-1)$ , hence $M_{u}[\cdot,q]=1\cdot M_{v}[\cdot,q]=T(1,p-1,p)$ . For $j\neq q$ , $M_{v}[\cdot,j]$ is of the form $T(n^{\prime},i^{\prime},p-1)$ for some $1\leq n^{\prime},0\leq i^{\prime}<p-1$ if $j\leq p-1$ , or $U(j,p-1)$ if $j>p-1$ . If $M_{v}[\cdot,j]=T(n^{\prime},i^{\prime},p-1)$ then ${\mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$\displaystyle\bigsqcup$}}{% \rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}}{\rotatebox[origin={c}]{1% 80.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$% \scriptscriptstyle\bigsqcup$}}}}=T(1,i^{\prime\prime},p)$ with $i^{\prime\prime}$ being $i^{\prime}$ or $p-1$ ; if $M_{v}[\cdot,j]=U(j,p-1)$ then ${\mathop{\mathchoice{\rotatebox[origin={c}]{180.0}{$\displaystyle\bigsqcup$}}{% \rotatebox[origin={c}]{180.0}{$\textstyle\bigsqcup$}}{\rotatebox[origin={c}]{1% 80.0}{$\scriptstyle\bigsqcup$}}{\rotatebox[origin={c}]{180.0}{$% \scriptscriptstyle\bigsqcup$}}}}=U(j,p)$ .

Finally, all cases yield results that agree with the claim and the proof is completed. $\hfill\blacktriangleleft$ When $u$ is not aligned, the patterns above still apply but modulo a permutation of the lines and columns of $M_{u}$ . Finally, the patterns also apply to $N_{u}$ , i.e., to $M_{u^{\mathsf{R}}}$ , but now alignment considers the letters of $u$ in their order of last appearance.

Lemma 5.3.

If all the letters of $\Sigma$ occur in $u$ , then $\#E_{u}\leq 2m(2m+1)$ .

Proof.

Since $E_{u}$ collects the maximal elements of $H_{u}$ , it is in particular an antichain of $H_{u}$ and we can bound its size by the size of the largest antichains in $H_{u}$ (also known as the width of $H_{u}$ , as in Dilworth’s Theorem). Recall that the tensors in $H_{u}$ are partially ordered by $\leq$ .

Recall also, from Definition 3.7, that a tensor of $H_{u}$ is obtained by combining a $\bm{\delta}_{i,j}$ from $F_{u}$ , with a $\bm{\gamma}_{i,j}$ from $G_{u}$ , i.e., builing $M_{v}[\cdot,i]\otimes M_{w}[\cdot,i]$ for a factorization $u=v\,w$ and a letter $a_{i}\in\Sigma$ . To simplify notations, when we consider a prefix $v$ of $u$ , we assume that the letters in $\Sigma$ are listed in the order they appear in $u$ . When looking at $N_{w}$ for $w$ a suffix of $u$ we assume that $\Sigma$ is listed in order of last appearances in $u$ : making the two different assumptions simultaneously is not a problem since they apply to vectors that will be combined via a tensor product, and the two implicit reindexings can be carried out independently on the two factors.

Now remember that, according to Lemma 5.2, for any $1\leq j\leq m$ , for any prefix $v$ of $u$ , $M_{v}[\cdot,j]$ is given by:

1.

If $a_{j}$ appears in $v$ then

$M_{v}[\cdot,j]=T(n,i,p)=\begin{matrix}\includegraphics{dagpub-standalone-% manual-dagpub-svg-displaymath-2025-07-24-19-28-45.pdf2svg.svg}\end{matrix}$

for some $0\leq i<p\leq m$ .
2.

If $a_{j}$ does not appear in $v$ then

$M_{v}[\cdot,j]=U(j,p)=\begin{matrix}\includegraphics{dagpub-standalone-manual-% dagpub-svg-displaymath-2025-07-24-19-28-56.pdf2svg.svg}\end{matrix}$

for some $0\leq p<j$ . In both cases the letters $a_{p+1}$ to $a_{m}$ do not appear in $v$ .

Similarly, for any $1\leq j\leq m$ , for any suffix $w$ of $u$ , $N_{w}[\cdot,j]$ matches one of the two patterns. Furthermore, for any factorization $u=vw$ , we observe that $M_{v}[\cdot,j]$ and $N_{w}[\cdot,j]$ cannot both match the second pattern since $a_{j}$ has to occur in $v$ or $w$ .

First let us bound the size of the antichains of

S_{p,p^{\prime}}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\left\{\;{% T(n,i,p)\otimes T(n^{\prime},i^{\prime},p^{\prime})}\;\middle|\;{n,n^{\prime}% \in\mathbb{N},0\leq i<p,0\leq i^{\prime}<p^{\prime}}\;\right\}

with $1\leq p,p^{\prime}\leq m$ fixed. Representing tensors in matrix form, and ignoring the $\ast$ rows and columns, we obtain matrices of the form

T(n,i,p)\otimes T(n^{\prime},i^{\prime},p^{\prime})\;=\;\begin{matrix}% \includegraphics{dagpub-standalone-manual-dagpub-svg-displaymath-2025-07-24-19% -29-11.pdf2svg.svg}\end{matrix}

with $x=n+n^{\prime}$ . Thus two tensors $T(n_{1},i,p)\otimes T(n_{1}^{\prime},i_{1}^{\prime},p^{\prime})$ and $T(n_{2},i,p)\otimes T(n_{2}^{\prime},i_{2}^{\prime},p^{\prime})$ are always comparable:

$\blacksquare$

if $n_{1}+n^{\prime}_{1}<n_{2}+n^{\prime}_{2}$ then $T(n_{1},i,p)\otimes T(n_{1}^{\prime},i_{1}^{\prime},p^{\prime})\leq T(n_{2},i,% p)\otimes T(n_{2}^{\prime},i_{2}^{\prime},p^{\prime})$ ;
$\blacksquare$

if $n_{1}+n^{\prime}_{1}>n_{2}+n^{\prime}_{2}$ then $T(n_{1},i,p)\otimes T(n_{1}^{\prime},i_{1}^{\prime},p^{\prime})\geq T(n_{2},i,% p)\otimes T(n_{2}^{\prime},i_{2}^{\prime},p^{\prime})$ ;
$\blacksquare$

otherwise $T(n_{1},i,p)\otimes T(n_{1}^{\prime},i_{1}^{\prime},p^{\prime})\leq T(n_{2},i,% p)\otimes T(n_{2}^{\prime},i_{2}^{\prime},p^{\prime})$ iff $i^{\prime}_{1}\leq i^{\prime}_{2}$ .

Since $0\leq i<p$ can have at most $p$ distinct values, the longest antichain of $S_{p,p^{\prime}}$ is of size at most $p$ . But we can reason symmetrically on $i^{\prime}$ , so in fine the longest antichain of $S_{p,p^{\prime}}$ is of size at most $\min(p,p^{\prime})$ .

Let us now bound the size of the antichains of

R_{p,p^{\prime}}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\left\{\;{% U(j,p)\otimes T(n^{\prime},i^{\prime},p^{\prime})}\;\middle|\;{p<j\leq m,n^{% \prime}\in\mathbb{N},0\leq i^{\prime}<p^{\prime}}\;\right\}

with $0\leq p<m$ and $0<p^{\prime}\leq m$ . Any two elements of $R_{p,p^{\prime}}$ are comparable if they have identical $U(j,p)$ . Since $p<j\leq m$ there are $(m-p)$ distinct $U(j,p)$ , hence any antichain of $R_{p,p^{\prime}}$ is of size at most $m-p$ .

Symmetrically,

L_{p,p^{\prime}}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\left\{\;{% T(n,i,p)\otimes U(j^{\prime},p^{\prime})}\;\middle|\;{n\in\mathbb{N},0\leq i<p% ,p^{\prime}<j^{\prime}\leq m}\;\right\}

with $0<p\leq m$ and $0\leq p^{\prime}<m$ , has antichains of size at most $m-p^{\prime}$ .

Now observe that

H_{u}\subseteq\bigcup_{(p,p^{\prime})\in P}S(p,p^{\prime})\cup R(p,p^{\prime})% \cup L(p,p^{\prime})\>,

where $P$ is the set of all $(p,p^{\prime})$ such that there exists a factorization $u=vw$ with $v$ and $w$ having respectively $p$ and $p^{\prime}$ distinct letters. The pairs $(p,p^{\prime})$ in $P$ depends on the order of appearances and disappearances of letters in $u$ , hence they can be ordered such that $p$ is increasing and $p^{\prime}$ is decreasing. Thus $P$ is of size at most $2m+1$ . Moreover, $\min(p,p^{\prime})+m-p+m-p^{\prime}\leq 2m$ . Summing all three bounds, we end up seeing that, as claimed, any antichain in $H_{u}$ has size bounded by

\sum_{(p,p^{\prime})\in P}\min(p,p^{\prime})+m-p+m-p^{\prime}\leq 2m(2m+1)\>.\

$\hfill\blacktriangleleft$

When $u$ does not use all letters of $\Sigma$ we write $\Gamma$ for the subset of $\Sigma$ that collect the letters occurring in $u$ . Now we have two $E_{u}$ sets depending on whether we consider the underlying alphabet to be $\Gamma$ or $\Sigma$ . Accordingly we write $E_{u}^{\Gamma}$ and $E_{u}^{\Sigma}$ (and also $M_{u}^{\Gamma}$ , etc.)

Lemma 5.4.

Assume $|\Gamma|=m-1$ . Then $\#E_{u}^{\Sigma}\leq\#E_{u}^{\Gamma}+2m-1$ .

(See Section A.1 for the proof). We can now wrap up this subsection.

Proof of Theorem 5.1.

Write $m^{\prime}$ for $|\Gamma|$ . By Lemma 5.3, $\#E_{u}^{\Gamma}\leq 2m^{\prime}(2m^{\prime}+1)$ . Invoking Lemma 5.4 $m-m^{\prime}$ times we obtain the required

	$\displaystyle\#E_{u}^{\Sigma}$	$\displaystyle\leq\#E_{u}^{\Gamma}+(2m^{\prime}+1)+\cdots+2m-1$
		$\displaystyle\leq 2m^{\prime}(2m^{\prime}+1)+\sum_{m^{\prime}<i\leq m}(2i-1)$
		$\displaystyle\leq 2m(2m+1)\>.\$

$\hfill\blacktriangleleft$ We conclude this section with a matching lower bound:

Theorem 5.5 ( $\#E_{u}$ is in $\Omega(m^{2})$ ).

For each $k=1,2,\ldots$ , there exists a word $u_{k}$ over a $k$ -letter alphabet such that $\#E_{u_{k}}\geq k(k+1)$ .

Describing and analyzing $u_{k}$ is quite involved so the proof has been relegated to Section A.2.

6 Concluding remarks

We developed a tropical algebraic approach to the computation of the $r$ and $\ell$ side distances that are the basis for the piecewise complexity of words. This allowed us to construct compact piecewise signatures and combine them elegantly and efficiently. An outcome of this approach is a polynomial-time algorithm for the piecewise complexity of compressed words. The proof that piecewise signatures have at most $O(m^{2})$ elements, where $m$ is the size of the alphabet, is technically more involved but is made clearer in the tropical framework.

One can certainly derive more results from our tropical approach to piecewise complexity. For example, recall that the subword universality index $\iota(u)$ of a word $u$ is the largest $L\in\mathbb{N}$ such that $u$ contains all the length- $L$ words of $\Sigma^{*}$ as subwords [1, 5].

Theorem 6.1.

$\iota(u)=0$ iff $\ast$ occurs in $M_{u}$ . When $\ast$ does not occur in $M_{u}$ then $\iota(u)$ is the value of the minimal entry in $M_{u}$ .

(See appendix for a proof.) This provides a simpler way (compared to [25]) of computing the universality index of SLP-compressed words in polynomial-time.

We hope that a tropical viewpoint can similarly profit related investigations on piecewise distance, canonical representatives modulo Simon’s congruence, piecewise separability, or piecewise-testable languages and associated automata.

The tropical setting also leads to clearer code. We implemented all the algorithms presented in this paper, in order to analyze examples and derive the results in Section 5, or to test huge SLPs. Once a software library for tropical vectors and matrices is set up, the implementation is just a direct transcription of the algebraic formulae given in the paper.

References

[1] L. Barker, P. Fleischmann, K. Harwardt, F. Manea, and D. Nowotka. Scattered factor-universality of words. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14–28. Springer, 2020. doi:10.1007/978-3-030-48516-0_2.
[2] M. Bojańczyk, L. Segoufin, and H. Straubing. Piecewise testable tree languages. Logical Methods in Comp. Science, 8(3), 2012. doi:10.2168/LMCS-8(3:26)2012.
[3] P. Butkovič. Max-algebra: the linear algebra of combinatorics? Linear Algebra and its Applications, 367:313–335, 2003. doi:10.1016/S0024-3795(02)00655-9.
[4] O. Carton and M. Pouzet. Simon’s theorem for scattered words. In Proc. DLT 2018, volume 11088 of Lecture Notes in Computer Science, pages 182–193. Springer, 2018. doi:10.1007/978-3-319-98654-8_15.
[5] J. D. Day, P. Fleischmann, M. Kosche, T. Koß, F. Manea, and S. Siemer. The edit distance to $k$ -subsequence universality. In Proc. STACS 2021, volume 187 of Leibniz International Proceedings in Informatics, pages 25:1–25:19. Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.25.
[6] V. Diekert, P. Gastin, and M. Kufleitner. A survey on small fragments of first-order logic over finite words. Int. J. Foundations of Computer Science, 19(3):513–548, 2008. doi:10.1142/S0129054108005802.
[7] M. Droste, W. Kuich, and H. Vogler, editors. Handbook of Weighted Automata. Monographs in Theoretical Computer Science. Springer, 2009.
[8] L. Fleischer and M. Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of Leibniz International Proceedings in Informatics, pages 62:1–62:13. Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.MFCS.2018.62.
[9] S. Gaubert and Max Plus. Methods and applications of (max, +) linear algebra. In Proc. STACS ’97, volume 1200 of Lecture Notes in Computer Science, pages 261–282. Springer, 1997. doi:10.1007/BFb0023465.
[10] P. Gawrychowski, M. Kosche, T. Koß, F. Manea, and S. Siemer. Efficiently testing Simon’s congruence. In Proc. STACS 2021, volume 187 of Leibniz International Proceedings in Informatics, pages 34:1–34:18. Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.34.
[11] J. Goubault-Larrecq and S. Schmitz. Deciding piecewise testable separability for regular tree languages. In Proc. ICALP 2016, volume 55 of Leibniz International Proceedings in Informatics, pages 97:1–97:15. Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICALP.2016.97.
[12] S. Halfon, Ph. Schnoebelen, and G. Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1–12. IEEE Comp. Soc. Press, 2017. doi:10.1109/LICS.2017.8005141.
[13] B. Heidergott, G. J. Olsder, and J. van der Woude. Max Plus at Work. Modelling and Analysis of Synchronized Systems: A Course on Max-Plus Algebra and its Applications. Princeton Series in Applied Mathematics. Princeton University Press, 2006.
[14] P. Karandikar, M. Kufleitner, and Ph. Schnoebelen. On the index of Simon’s congruence for piecewise testability. Information Processing Letters, 115(4):515–519, 2015. doi:10.1016/j.ipl.2014.11.008.
[15] P. Karandikar and Ph. Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Logical Methods in Comp. Science, 15(2), 2019. doi:10.23638/LMCS-15(2:6)2019.
[16] V. I. Levenshtein. Efficient reconstruction of sequences from their subsequences or supersequences. Journal of Combinatorial Theory, Series A, 93(2):310–332, 2001. doi:10.1006/jcta.2000.3081.
[17] M. Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241–299, 2012. doi:10.1515/gcc-2012-0016.
[18] S. Lombardy and J. Mairesse. Max-plus automata. In Jean-Éric Pin, editor, Handbook of Automata Theory. Volume I: Theoretical Foundations, chapter 5, pages 151–188. EMS Press, 2021. doi:10.4171/Automata-1/5.
[19] D. G. Northcott. Multilinear Algebra. Cambridge Univ. Press, 1984.
[20] P. P. Pach. Normal forms under Simon’s congruence. Semigroup Forum, 97:251–267, 2018. doi:10.1007/s00233-017-9910-5.
[21] J.-É. Pin. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semigroup Forum, 98(1–8), 2019. doi:10.1007/s00233-019-09999-8.
[22] J.-É. Pin, J. M. Taylor, and M. Atiyah. Tropical semirings. In J. Gunawardena, editor, Idempotency, pages 50–69. Cambridge Univ. Press, 1998. doi:10.1017/CBO9780511662508.004.
[23] M. Praveen, Ph. Schnoebelen, J. Veron, and I. Vialard. On the piecewise complexity of words and periodic words. In Proc. SOFSEM 2024, volume 14519 of Lecture Notes in Computer Science, pages 456–470. Springer, 2024. doi:10.1007/978-3-031-52113-3_32.
[24] J. Sakarovitch and I. Simon. Subwords. In M. Lothaire, editor, Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications, chapter 6, pages 105–142. Cambridge Univ. Press, 1983.
[25] Ph. Schnoebelen and J. Veron. On arch factorization and subword universality for words and compressed words. In Proc. WORDS 2023, volume 13899 of Lecture Notes in Computer Science, pages 274–287. Springer, 2023. doi:10.1007/978-3-031-33180-0_21.
[26] Ph. Schnoebelen and I. Vialard. On the piecewise complexity of words. Acta Informatica, 62(1), 2025. doi:10.1007/s00236-025-00480-4.
[27] I. Simon. Hierarchies of Events with Dot-Depth One. PhD thesis, University of Waterloo, Dept. Applied Analysis and Computer Science, 1972. URL: http://maveric.uwaterloo.ca/reports/1972_Simon_PhD.pdf.
[28] I. Simon. Piecewise testable events. In Proc. 2nd GI Conf. on Automata Theory and Formal Languages, volume 33 of Lecture Notes in Computer Science, pages 214–222. Springer, 1975. doi:10.1007/3-540-07407-4_23.
[29] I. Simon. Limited subsets of a free monoid. In Proc. FOCS ’78, pages 143–150. IEEE Comp. Soc. Press, 1978. doi:10.1109/SFCS.1978.21.

Appendix A Proofs omitted from main text

A.1 Proof of Lemma 5.4

See 5.4

We first observe that

\displaystyle M_{u}^{\Sigma}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-29-21.pdf2svg.svg}\end{matrix}\>,

\displaystyle N_{u}^{\Sigma}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-29-29.pdf2svg.svg}\end{matrix}\>.

As a consequence, and when $i\leq m^{\prime}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}m-1$ , a $\bm{\delta}_{i,j}^{\Sigma}$ in $F_{u}^{\Sigma}$ is $\bm{\delta}_{i,j}^{\Gamma}$ extended with one $\ast$ . When $i=m$ , $\bm{\delta}_{i,j}^{\Gamma}$ is $\bigl{(}\alpha_{1}\;\cdots\;\alpha_{m^{\prime}}\;0)$ where $\alpha_{s}$ is $1$ if $a_{s}$ occurs in $u(,j)$ , and is $\ast$ otherwise. The same reasoning applies to a $\bm{\gamma}_{i,j}^{\Sigma}$ in $G_{u}^{\Sigma}$ , except that now, when $i=m$ , one considers occurrences in $u(j,)$ .

Looking now at $H_{u}^{\Sigma}$ , a tensor $\bm{\delta}^{\Sigma}_{i,j}\otimes\bm{\gamma}^{\Sigma}_{i,j}$ with $i\leq m^{\prime}$ is just the corresponding $\bm{\eta}_{i,j}^{\Gamma}$ with $2m-1$ extra $\ast$ ’s inserted at fixed positions. Therefore, for $i,i^{\prime}<m$ , the tensors at positions $(i,j)$ and $(i^{\prime},j^{\prime})$ are ordered (or incomparable) in $H_{u}^{\Sigma}$ exactly as in $H_{u}^{\Gamma}$ . We also have to consider the new tensors at positions $(m,j)$ : they all end with $0$ hence cannot dominate any tensor on a line with $i<m$ (these end with $\ast$ ). However they can be incomparable with these and contribute new maximal elements. Now, there can be at most $2m^{\prime}+1$ incomparable tensors on the line $i=m$ since they can be ordered with pairs $(p,p^{\prime})\in P$ as in the proof of Lemma 5.3.

Finally $H_{u}^{\Sigma}$ contains all the maximal elements of $H_{u}^{\Gamma}$ (duly extended) and at most $2m^{\prime}+1$ , i.e., $2m-1$ , new maximal elements taken from the last (new) row. Hence the claim.

A.2 Proof of Theorem 5.5

See 5.5

We have to define a word $u_{k}$ over a $k$ -letter alphabet and show that $\#E_{u_{k}}\geq k(k+1)$ .

Fix $\Sigma=\{a_{1},\ldots,a_{k}\}$ , sometimes written $\{\mathtt{A},\mathtt{B},\mathtt{C},\ldots\}$ in examples. We let $W_{k}\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}a_{1}a_{2}\cdots a_{k}$ denote the word that lists the $k$ letters in order. For $0\leq j\leq k$ we further write $W_{j,k}$ for the suffix $W(k-j,k)$ , i.e., the $j$ last letters of $W_{k}$ . We then define $v_{k}$ and $u_{k}$ via

\displaystyle v_{k}

\displaystyle\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}W_{1,k}W_{2,k% }\cdots W_{k-1,k}\>,

\displaystyle u_{k}

\displaystyle\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}W_{k}\,v_{k}% \,a_{1}\,v_{k}^{\mathsf{R}}\,W_{k}^{\mathsf{R}}\>.

Observe that $u_{k}$ is a palindrome. As an example, here are $u_{k}$ and $v_{k}$ for $k=4$ :

\displaystyle v_{4}

\displaystyle=W_{1,4}W_{2,4}W_{3,4}=\mathtt{D}\;\mathtt{CD}\;\mathtt{BCD}\>,

\displaystyle u_{4}

\displaystyle=\underbrace{\mathtt{ABCD}}_{W_{4}}\;\underbrace{\mathtt{DCDBCD}}% _{v_{4}}\;\mathtt{A}\;\underbrace{\mathtt{DCBDCD}}_{v_{4}^{\mathsf{R}}}\;% \underbrace{\mathtt{DCBA}}_{W_{4}^{\mathsf{R}}}\>.

We start by computing the extension matrices associated with various prefixes and suffixes of $u_{k}$ . An easy induction on $k$ provides the following $k$ -by- $k$ matrices for any $k\geq 2$ :

\displaystyle M_{W_{k}}=N_{W_{k}^{\mathsf{R}}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-29-46.pdf2svg.svg}\end{matrix}\>,

\displaystyle N_{W_{k}}=M_{W_{k}^{\mathsf{R}}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-29-56.pdf2svg.svg}\end{matrix}\>.

From these, we can deduce the $k$ -by- $k$ extension matrices for the $W_{n,k}$ where $0\leq n\leq k$ :

\displaystyle M_{W_{n,k}}=N_{W_{n,k}^{\mathsf{R}}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-30-11.pdf2svg.svg}\end{matrix}\>,

\displaystyle M_{W_{n,k}^{\mathsf{R}}}=N_{W_{n,k}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-30-20.pdf2svg.svg}\end{matrix}\>.

Combining the above matrices, we compute the extension matrix for $v_{k}$ :

M_{v_{k}}=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-svg-% displaymath-2025-07-24-19-30-29.pdf2svg.svg}\end{matrix}

with $3$ as the maximal value, filling the bottom-right part. A consequence is that $M_{W_{k}\,v_{k}\,a_{1}}$ , obtained as the product $M_{W_{k}}\cdot M_{v_{k}}\cdot M_{a_{1}}$ is the $k$ -by- $k$ all- $2$ s matrix.

Let us now compute $N_{v_{k}}$ and $M_{W_{k}\,v_{k}\,a_{1}\,v_{k}^{\mathsf{R}}}$ , noting that $W_{k}\,v_{k}\,a_{1}\,v_{k}^{\mathsf{R}}$ is $u_{k}(k,)^{\mathsf{R}}$ :

\displaystyle N_{v_{k}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-30-38.pdf2svg.svg}\end{matrix}\>,

\displaystyle M_{W_{k}\,v_{k}\,a_{1}\,v_{k}^{\mathsf{R}}}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-30-46.pdf2svg.svg}\end{matrix}\>.

Finally, for a factorization $u_{k}=u_{k}(,n)u_{k}(n,)$ with $n<k$ , we compute $M_{u_{k}(,n)}$ and $N_{u_{k}(n,)}$ , noting that $u_{k}(,n)=W_{n}$ and $u_{k}(n,)^{\mathsf{R}}=W_{k}v_{k}a_{1}v_{k}^{\mathsf{R}}a_{k}a_{k{-}1}\ldots a% _{n{+}1}$ .

\displaystyle M_{u_{k}(,n)}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-30-58.pdf2svg.svg}\end{matrix}\>,

\displaystyle N_{u_{k}(n,)}

\displaystyle=\begin{matrix}\includegraphics{dagpub-standalone-manual-dagpub-% svg-displaymath-2025-07-24-19-31-17.pdf2svg.svg}\end{matrix}\>.

Lemma A.1.

$H_{u_{k}}$ contains all the $M_{u_{k}(,n)}[\cdot,i]\otimes N_{u_{k}(n,)}[\cdot,i]$ and the $N_{u_{k}(n,)}[\cdot,i]\otimes M_{u_{k}(,n)}[\cdot,i]$ for $0\leq n<i\leq k$ .

Proof.

Indeed this just expresses the $\bm{\delta}_{i,n}$ and $\bm{\gamma}_{i,n}$ vectors of $F_{u_{k}}$ and $G_{u_{k}}$ as columns of the matrices we computed. Note that, in the claim, the products of the first kind are taken from the $k$ first columns of $H_{u}$ , while the products of the second kind come from its $k$ last columns but are expressed in terms of the first columns of $F_{u_{k}}$ and $G_{u_{k}}$ , taking advantage of the symmetry in $u_{k}$ . $\hfill\blacktriangleleft$

Looking now at the matrix for $M_{u_{k}(,n)}$ we see that, for $n<i$ , the $i$ th column is $U(i,n)$ in the notation used for Lemma 5.2, while the $i$ th column of $N_{u_{k}(,n)}$ is $T(n+2,k,k)$ .

Lemma A.2.

The set

S_{k}\;\stackrel{{\scriptstyle\mbox{\scriptsize def}}}{{=}}\;\Bigl{\{}U(i,n)% \otimes T(n+2,k,k),\;T(n+2,k,k)\otimes U(i,n)\;\Big{|}\;0\leq n<i\leq k\Bigr{\}}

is an antichain in $H_{u_{k}}$ .

Proof (Sketch).

We know that these tensors are in $H_{u_{k}}$ since they are exactly those listed in Lemma A.1, now expressed as $T$ and $U$ patterns. Stated in this form it is easy to see that they are pairwise incomparable, hence form an antichain. $\hfill\blacktriangleleft$ We now conclude with the announced result:

Lemma A.3.

$\#E_{u_{k}}\geq k(k+1)$ .

Proof.

After the above preparations, and since $S_{k}$ has cardinal $k(k+1)$ , it is enough to prove that the tensors in $S_{k}$ are maximal in $H_{u_{k}}$ . However we are not ready to do precisely that in this extended abstract.

Instead, consider that every $U(i,n)$ for all $0\leq n<k-1$ , $n<i\leq k$ contains a $\ast$ . If there exist a factorization $u_{k}=v^{\prime}w^{\prime}$ and some $i^{\prime}\leq k$ such that $U(i,n)\otimes T(n+2,k,k)$ is strictly dominated by $M_{v^{\prime}}[\cdot,i^{\prime}]\otimes N_{{w^{\prime}}}[\cdot,i^{\prime}]$ , then necessarily $U(i,n)\leq M_{v^{\prime}}[\cdot,i^{\prime}]$ , thus $M_{v^{\prime}}[\cdot,i^{\prime}]$ contains a $\ast$ at every position where $U(i,n)$ does. Hence we deduce that either $v^{\prime}$ is a prefix of $u_{k}(,n)$ and $i=i^{\prime}$ , or $v^{\prime}=v_{k}(,n+1)=v_{k}(,n)a_{n+1}$ with $i=n+1$ and $i^{\prime}\leq n+1$ . However, in these few specific cases that are easy to list, we can readily check that the full product $U(i,n)\otimes T(m+2,k,k)$ is not dominated by $M_{v^{\prime}}[\cdot,i^{\prime}]\otimes N_{{w^{\prime}}}[\cdot,i^{\prime}]$ . So all but two tensors in $S_{k}$ are already seen to be maximal.

There remains the case $n=k-1$ and $i=k$ . Here it is harder to show that $U(k,k-1)\otimes T(k+1,k,k)$ , i.e., $\bm{\eta}_{k,k-1}$ , is maximal since this tensor has no $\ast$ and thus could possibly be dominated by some $\bm{\eta}_{i^{\prime},j}$ for any $i^{\prime},j$ with $k\leq j\leq|u_{k}|-k$ , i.e., for a factorization $u_{k}=v^{\prime}w^{\prime}$ where both $v^{\prime}$ and $w^{\prime}$ contain the full alphabet.

However, and since $\bm{\eta}_{k,k-1}$ , is incomparable with the rest of $S_{k}$ , if $\bm{\eta}_{k,k-1}$ is not maximal, then $H_{u_{k}}$ contains a tensor that dominates $\bm{\eta}_{k,k-1}$ , hence that necessarily contains no $\ast$ and thus does not dominate any other tensor from $S_{k}$ . (The same argument goes for the symmetric tensor on the right-hand side of $H_{u_{k}}$ .) Finally, the announced $k(k+1)$ lower bound on the size of $\#E_{u_{k}}$ is demonstrated. $\hfill\blacktriangleleft$

A.3 Proof of Theorem 6.1

See 6.1 First, if $\ast$ occurs in $M_{u}$ , then $u$ does not contain every letter of the alphabet, hence $\iota(u)=0$ .

In the other cases, we consider the arch factorization of $u$ (see [25]). Observe that when $u$ contains exactly one arch and ends with letter $a_{i}$ , the $i$ -th row of $M_{u}$ is a row of $1$ , while in the other rows only the numbers $1$ and $2$ appear.

If $\iota(u)=k\in\mathbb{N}$ then $u$ can be factorized as $u_{1}\dots u_{k}v$ where every $u_{j}$ is an arch and $v$ does not contain every letter of the alphabet. Let $a_{i}$ be the last letter of $u_{1}$ . Then observe that, $M_{u_{1}\dots u_{k}}$ contains only the numbers $k$ and $k+1$ , and its $i$ -th row is a row of $k$ ’s. Thus, $M_{u}=M_{u_{1}\dots u_{k}}\cdot M_{v}$ contains only the numbers $k$ and $k+1$ , and $M_{u}[i,j]=k$ for any $j$ such that $a_{j}$ does not appear in $v$ .

[bib.bib1] [1] L. Barker, P. Fleischmann, K. Harwardt, F. Manea, and D. Nowotka. Scattered factor-universality of words. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14–28. Springer, 2020. doi:10.1007/978-3-030-48516-0_2.

[bib.bib2] [2] M. Bojańczyk, L. Segoufin, and H. Straubing. Piecewise testable tree languages. Logical Methods in Comp. Science, 8(3), 2012. doi:10.2168/LMCS-8(3:26)2012.

[bib.bib3] [3] P. Butkovič. Max-algebra: the linear algebra of combinatorics? Linear Algebra and its Applications, 367:313–335, 2003. doi:10.1016/S0024-3795(02)00655-9.

[bib.bib4] [4] O. Carton and M. Pouzet. Simon’s theorem for scattered words. In Proc. DLT 2018, volume 11088 of Lecture Notes in Computer Science, pages 182–193. Springer, 2018. doi:10.1007/978-3-319-98654-8_15.

[bib.bib5] [5] J. D. Day, P. Fleischmann, M. Kosche, T. Koß, F. Manea, and S. Siemer. The edit distance to $k$ -subsequence universality. In Proc. STACS 2021, volume 187 of Leibniz International Proceedings in Informatics, pages 25:1–25:19. Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.25.

[bib.bib6] [6] V. Diekert, P. Gastin, and M. Kufleitner. A survey on small fragments of first-order logic over finite words. Int. J. Foundations of Computer Science, 19(3):513–548, 2008. doi:10.1142/S0129054108005802.

[bib.bib7] [7] M. Droste, W. Kuich, and H. Vogler, editors. Handbook of Weighted Automata. Monographs in Theoretical Computer Science. Springer, 2009.

[bib.bib8] [8] L. Fleischer and M. Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of Leibniz International Proceedings in Informatics, pages 62:1–62:13. Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.MFCS.2018.62.

[bib.bib9] [9] S. Gaubert and Max Plus. Methods and applications of (max, +) linear algebra. In Proc. STACS ’97, volume 1200 of Lecture Notes in Computer Science, pages 261–282. Springer, 1997. doi:10.1007/BFb0023465.

[bib.bib10] [10] P. Gawrychowski, M. Kosche, T. Koß, F. Manea, and S. Siemer. Efficiently testing Simon’s congruence. In Proc. STACS 2021, volume 187 of Leibniz International Proceedings in Informatics, pages 34:1–34:18. Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.STACS.2021.34.

[bib.bib11] [11] J. Goubault-Larrecq and S. Schmitz. Deciding piecewise testable separability for regular tree languages. In Proc. ICALP 2016, volume 55 of Leibniz International Proceedings in Informatics, pages 97:1–97:15. Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICALP.2016.97.

[bib.bib12] [12] S. Halfon, Ph. Schnoebelen, and G. Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1–12. IEEE Comp. Soc. Press, 2017. doi:10.1109/LICS.2017.8005141.

[bib.bib13] [13] B. Heidergott, G. J. Olsder, and J. van der Woude. Max Plus at Work. Modelling and Analysis of Synchronized Systems: A Course on Max-Plus Algebra and its Applications. Princeton Series in Applied Mathematics. Princeton University Press, 2006.

[bib.bib14] [14] P. Karandikar, M. Kufleitner, and Ph. Schnoebelen. On the index of Simon’s congruence for piecewise testability. Information Processing Letters, 115(4):515–519, 2015. doi:10.1016/j.ipl.2014.11.008.

[bib.bib15] [15] P. Karandikar and Ph. Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Logical Methods in Comp. Science, 15(2), 2019. doi:10.23638/LMCS-15(2:6)2019.

[bib.bib16] [16] V. I. Levenshtein. Efficient reconstruction of sequences from their subsequences or supersequences. Journal of Combinatorial Theory, Series A, 93(2):310–332, 2001. doi:10.1006/jcta.2000.3081.

[bib.bib17] [17] M. Lohrey. Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology, 4(2):241–299, 2012. doi:10.1515/gcc-2012-0016.

[bib.bib18] [18] S. Lombardy and J. Mairesse. Max-plus automata. In Jean-Éric Pin, editor, Handbook of Automata Theory. Volume I: Theoretical Foundations, chapter 5, pages 151–188. EMS Press, 2021. doi:10.4171/Automata-1/5.

[bib.bib19] [19] D. G. Northcott. Multilinear Algebra. Cambridge Univ. Press, 1984.

[bib.bib20] [20] P. P. Pach. Normal forms under Simon’s congruence. Semigroup Forum, 97:251–267, 2018. doi:10.1007/s00233-017-9910-5.

[bib.bib21] [21] J.-É. Pin. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semigroup Forum, 98(1–8), 2019. doi:10.1007/s00233-019-09999-8.

[bib.bib22] [22] J.-É. Pin, J. M. Taylor, and M. Atiyah. Tropical semirings. In J. Gunawardena, editor, Idempotency, pages 50–69. Cambridge Univ. Press, 1998. doi:10.1017/CBO9780511662508.004.

[bib.bib23] [23] M. Praveen, Ph. Schnoebelen, J. Veron, and I. Vialard. On the piecewise complexity of words and periodic words. In Proc. SOFSEM 2024, volume 14519 of Lecture Notes in Computer Science, pages 456–470. Springer, 2024. doi:10.1007/978-3-031-52113-3_32.

[bib.bib24] [24] J. Sakarovitch and I. Simon. Subwords. In M. Lothaire, editor, Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications, chapter 6, pages 105–142. Cambridge Univ. Press, 1983.

[bib.bib25] [25] Ph. Schnoebelen and J. Veron. On arch factorization and subword universality for words and compressed words. In Proc. WORDS 2023, volume 13899 of Lecture Notes in Computer Science, pages 274–287. Springer, 2023. doi:10.1007/978-3-031-33180-0_21.

[bib.bib26] [26] Ph. Schnoebelen and I. Vialard. On the piecewise complexity of words. Acta Informatica, 62(1), 2025. doi:10.1007/s00236-025-00480-4.

[bib.bib27] [27] I. Simon. Hierarchies of Events with Dot-Depth One. PhD thesis, University of Waterloo, Dept. Applied Analysis and Computer Science, 1972. URL: http://maveric.uwaterloo.ca/reports/1972_Simon_PhD.pdf.

[bib.bib28] [28] I. Simon. Piecewise testable events. In Proc. 2nd GI Conf. on Automata Theory and Formal Languages, volume 33 of Lecture Notes in Computer Science, pages 214–222. Springer, 1975. doi:10.1007/3-540-07407-4_23.

[bib.bib29] [29] I. Simon. Limited subsets of a free monoid. In Proc. FOCS ’78, pages 143–150. IEEE Comp. Soc. Press, 1978. doi:10.1109/SFCS.1978.21.

A Tropical Approach to the Compositional Piecewise Complexity of Words and Compressed Words

Abstract

Keywords and phrases:

Category:

Copyright and License:

2012 ACM Subject Classification:

Acknowledgements:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Subwords and piecewise complexity

Our contribution

Related work

Outline of the paper

2 Basic notions

2.1 Subwords and the piecewise complexity of words

2.2 Computing 𝒉⁢(𝒖)

Lemma 2.1 ([26, Lem. 3.8]).

Example 2.2.

3 Tropical algebra and piecewise complexity

3.1 The tropical semiring

3.2 A tropical view of the 𝒓- and ℓ-tables

Definition 3.1.

Example 3.2.

Lemma 3.3.

Proof.

3.3 Compositional computation of 𝑹𝒖

Definition 3.4 (f-r-table).

Lemma 3.5.

Proof.

Example 3.6.

3.4 Summing 𝒓− and ℓ-tables

Definition 3.7 (Summing Fu and Gu).

Definition 3.8 (Piecewise signatures).

Example 3.9.

Lemma 3.10.

Proof.

Theorem 3.11 (Piecewise complexity from signatures).

Proof.

Example 3.12.

Theorem 3.13 (Space complexity of piecewise signatures).

Proof.

3.5 Combining piecewise signatures

Theorem 3.14 (Compositionality).

Lemma 3.15.

Proof (of Lemma 3.15).

Lemma 3.16.

Proof.

Proof of Theorem 3.14.

4 Piecewise complexity of SLP-compressed words

Theorem 4.1 (Piecewise signatures of compressed words).

Proof.

Corollary 4.2.

5 The size of piecewise signatures

Theorem 5.1.

Lemma 5.2.

Proof.

Lemma 5.3.

Proof.

Lemma 5.4.

Proof of Theorem 5.1.

Theorem 5.5 (#⁢Eu is in Ω⁢(m2)).

6 Concluding remarks

Theorem 6.1.

References

Appendix A Proofs omitted from main text

A.1 Proof of Lemma 5.4

A.2 Proof of Theorem 5.5

Lemma A.1.

Proof.

Lemma A.2.

Proof (Sketch).

Lemma A.3.

Proof.

A.3 Proof of Theorem 6.1

2.2 Computing $h(u)$

3.2 A tropical view of the $𝒓$ - and $\ell$ -tables

3.3 Compositional computation of $R_{u}$

Definition 3.4 (f- $r$ -table).

3.4 Summing $r-$ and $\ell$ -tables

Definition 3.7 (Summing $F_{u}$ and $G_{u}$ ).

Theorem 5.5 ( $\#E_{u}$ is in $\Omega(m^{2})$ ).