Abstract 1 Introduction 2 Preliminaries 3 Properties of 𝜹 and 𝜹𝚺 4 Core-Sparse Monge Matrix Min-Plus Multiplication Algorithm 5 Range LIS Queries: Sketch References

Core-Sparse Monge Matrix Multiplication

Paweł Gawrychowski ORCID University of Wrocław, Poland Egor Gorbachev ORCID Saarland University and Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany Tomasz Kociumaka ORCID Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Abstract

Min-plus matrix multiplication is a fundamental tool for designing algorithms operating on distances in graphs and different problems solvable by dynamic programming. We know that, assuming the APSP hypothesis, no subcubic-time algorithm exists for the case of general matrices. However, in many applications the matrices admit certain structural properties that can be used to design faster algorithms. For example, when considering a planar graph, one often works with a Monge matrix A, meaning that the density matrix A has non-negative entries, that is, Ai,jAi+1,j+Ai,j+1Ai,jAi+1,j+10. The min-plus product of two n×n Monge matrices can be computed in 𝒪(n2) time using the famous SMAWK algorithm.

In applications such as longest common subsequence, edit distance, and longest increasing subsequence, the matrices are even more structured, as observed by Tiskin [J. Discrete Algorithms, 2008]: they are (or can be converted to) simple unit-Monge matrices, meaning that the density matrix is a permutation matrix and, furthermore, the first column and the last row of the matrix consist of only zeroes. Such matrices admit an implicit representation of size 𝒪(n) and, as shown by Tiskin [SODA 2010 & Algorithmica, 2015], their min-plus product can be computed in 𝒪(nlogn) time. Russo [SPIRE 2010 & Theor. Comput. Sci., 2012] identified a general structural property of matrices that admit such efficient representation and min-plus multiplication algorithms: the core size δ, defined as the number of non-zero entries in the density matrices of the input and output matrices. He provided an adaptive implementation of the SMAWK algorithm that runs in 𝒪((n+δ)log3n) or 𝒪((n+δ)log2n) time (depending on the representation of the input matrices).

In this work, we further investigate the core size as the parameter that enables efficient min-plus matrix multiplication. On the combinatorial side, we provide a (linear) bound on the core size of the product matrix in terms of the core sizes of the input matrices. On the algorithmic side, we generalize Tiskin’s algorithm (but, arguably, with a more elementary analysis) to solve the core-sparse Monge matrix multiplication problem in 𝒪(n+δlogδ)𝒪(n+δlogn) time, matching the complexity for simple unit-Monge matrices. As witnessed by the recent work of Gorbachev and Kociumaka [STOC’25] for edit distance with integer weights, our generalization opens up the possibility of speed-ups for weighted sequence alignment problems. Furthermore, our multiplication algorithm is also capable of producing an efficient data structure for recovering the witness for any given entry of the output matrix. This allows us, for example, to preprocess an integer array of size n in ~𝒪(n) time so that the longest increasing subsequence of any sub-array can be reconstructed in ~𝒪() time, where is the length of the reported subsequence. In comparison, Karthik C. S. and Rahul [arXiv, 2024] recently achieved 𝒪(+n1/2polylogn)-time reporting after 𝒪(n3/2polylogn)-time preprocessing.

Keywords and phrases:
Min-plus matrix multiplication, Monge matrix, longest increasing subsequence
Funding:
Egor Gorbachev: This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979).
Copyright and License:
[Uncaptioned image] © Paweł Gawrychowski, Egor Gorbachev, and Tomasz Kociumaka; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
Related Version:
Full Version: https://arxiv.org/abs/2408.04613v2 [21]
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

The min-plus product (also known as the distance product or the tropical product) of two matrices A and B is defined as a matrix C=AB such that Ci,k=minj(Ai,j+Bj,k). The task of computing the min-plus product of two n×n matrices can be solved in n3/exp(Ω(logn)) time [46], and it is fine-grained equivalent to the All-Pairs Shortest Path (APSP) problem [44], asking to compute the distances between every pair of vertices in a directed weighted graph on n vertices. While it is conjectured that APSP, and hence also the min-plus product, do not admit n3Ω(1)-time solutions, faster algorithms exist for many special cases arising in numerous applications of the min-plus product; see, e.g., [2, 43, 49, 10, 6, 47, 23, 15, 16, 17]. Most of the underlying procedures rely on fast matrix multiplication for the standard (+,)-product.

Monge matrices constitute a notable exception: An n×n matrix A is a Monge matrix if its density matrix A is non-negative, that is, Ai,jAi+1,j+Ai,j+1Ai,jAi+1,j+10 holds for all i,j[0..n1). The min-plus product of two n×n Monge matrices can be computed in 𝒪(n2) time using the SMAWK algorithm [1], and the resulting matrix still satisfies the Monge property. Monge matrices arise in many combinatorial optimization problems; see [8, 7] for surveys. One of the most successful applications is for planar graphs, where the distances between vertices on a single face satisfy the Monge property (see [18, Section 2.3]). This observation allowed for an ~𝒪(n)-time111Throughout this paper, we use ~𝒪() notation to suppress factors poly-logarithmic in the input size. single-source shortest path algorithm for planar graphs (with negative real weights) [18], and the resulting techniques now belong to the standard toolkit for designing planar graph algorithms; see, e.g., [5, 25, 11].

Another important application of Monge matrices is in sequence alignment problems, such as edit distance and longest common subsequence (LCS), as well as in the related longest increasing subsequence (LIS) problem. Already in the late 1980s, Apostolico, Atallah, Larmore, and McFaddin [3] noted that the so-called DIST matrices, which (among others) specify the weighted edit distances between prefixes of one string and suffixes of another string, satisfy the Monge property. A modern interpretation of this observation is that these matrices store boundary-to-boundary distances in planar alignment graphs [37]; see also [4, 26, 32] for further early applications of DIST matrices and their Monge property.

In the late 2000s, Tiskin [39, 42] observed that the DIST matrices originating from the unweighted variants of edit distance, LCS, and LIS problems are more structured. For this, he introduced the notions of unit-Monge matrices, whose density matrices are permutation matrices (that is, binary matrices whose rows and columns contain exactly a single 1 entry each) and simple Monge matrices, whose leftmost column and bottommost row consist of zeroes. He also proved that the product of two simple unit-Monge matrices still belongs to this class and can be computed in 𝒪(nlogn) time provided that each matrix A is represented using the underlying permutation PA [41]. By now, the resulting algorithm has found numerous applications, including for computing LCS and edit distance of compressed strings [20, 24, 41, 19], maintaining these similarity measures for dynamic strings [13, 22], approximate pattern matching [40, 14], parallel and distributed algorithms for similarity measures [31, 33], and oracles for substring similarity [35, 12, 36]. Furthermore, Tiskin’s algorithm has been used to solve the LIS problem in various settings, such as dynamic [28], parallel [9], and distributed [30]. A disadvantage of Tiskin’s original description (and even the later informal descriptions [29]) is its dependence on the algebraic structure known as the monoid of seaweed braids, which natively supports unweighted LCS only (tasks involving edit distance need to be reduced to LCS counterparts). This makes the algorithm difficult to generalize to weighted problems and extend even to seemingly simple questions such as recovering (an implicit representation of) the witness indices j such that Ci,k=Ai,j+Bj,k [27].

Russo [34] identified the number of non-zero elements in A, B, and (AB), called the core size δ, as the parameter that enables fast min-plus matrix multiplication. It is easy to see that A, B, and AB can be stored in 𝒪(n+δ) space using a condensed representation, consisting of the boundary entries (the leftmost column and bottommost row, e.g., suffice) and core elements, i.e., the non-zero entries of the density matrix. Then, Russo’s algorithm for the core-sparse Monge matrix multiplication problem computes the condensed representation of AB in 𝒪((n+δ)log2n) time when provided with constant-time random access to A and B, and in 𝒪((n+δ)log3n) time given condensed representations of A and B.222More precisely, Russo’s algorithm builds a data structure that provides 𝒪(logn)-time random access to the entries of AB. Consequently, for repeated multiplication, we cannot assume constant-time access to the input matrices. Hence, 𝒪((n+δ)log3n) is a more realistic bound. Russo’s algorithm has a very different structure than Tiskin’s: it relies on a clever adaptation of the SMAWK algorithm [1] to provide efficient random access to C and then employs binary search to find individual non-zero entries of C. This brings the question of unifying both approaches and understanding the complexity of core-sparse Monge matrix multiplication.

Our Results.

We consider the core-sparse Monge matrix multiplication problem from the combinatorial and algorithmic points of view, and confirm that the core size is the right parameter that enables fast min-plus matrix multiplication. Let δ(A), or the core size of A, denote the number of non-zero elements in A. We begin with analyzing, in Section 3, how the core size of AB depends on the core sizes of A and B.

Theorem 1.1.

Let A be a p×q Monge matrix, and let B be a q×r Monge matrix. We have δ(AB)2(δ(A)+δ(B)).

We stress that this the first bound on δ(AB) in terms of δ(A) and δ(B): the complexity analysis of Russo’s algorithm [34] requires a bound on all δ(A), δ(B), and δ(AB).

Next, in Section 4, we generalize Tiskin’s algorithm (but fully avoiding the formalism of the seaweed product) to solve the core-sparse Monge matrix multiplication problem. We believe that the more elementary interpretation makes our viewpoint not only more robust but also easier to understand. At the same time, the extension from simple unit-Monge matrices to general core-sparse Monge matrices introduces a few technical complications handled in the full version of the paper [21]. Notably, we need to keep track of the leftmost column and the bottommost row instead of assuming they are filled with zeroes. Further, the core does not form a permutation so splitting it into two halves of the same size requires some calculations.

Theorem 1.2.

There is a (deterministic) algorithm that, given the condensed representations of a p×q Monge matrix A and a q×r Monge matrix B, in time 𝒪(p+q+r+(δ(A)+δ(B))log(1+δ(A)+δ(B))) computes the condensed representation of AB.333In the technical sections of the paper we use the 𝒪()-notation conservatively. Specifically, we interpret 𝒪(f(x1,,xk)) as the set of functions g(x1,,xk) for which there are constants cg,Ng>0 such that g(x1,,xk)cgf(x1,,xk) holds for all valid tuples (x1,,xk) satisfying maxixiNg. Accordingly, whenever the expression inside 𝒪() depends on multiple parameters, we sometimes add 1 or 2 to the arguments of logarithms to ensure formal correctness in corner cases.

The above complexity improves upon Russo’s [34] and matches Tiskin’s [41] for the simple unit-Monge case. Thanks to the more direct description, we can easily extend our algorithm to build (in the same complexity) an 𝒪(n+δ)-size data structure that, given (i,k), in 𝒪(logn) time computes the smallest witness j such that Ci,k=Ai,j+Bj,k.

Applications.

As an application of our witness recovery functionality, we consider the problem of range LIS queries. This task asks to preprocess an integer array s[0..n) so that, later, given two indices 0i<jn, the longest increasing subsequence (LIS) of the sub-array s[i..j) can be reported efficiently. Tiskin [38, 41] showed that the LIS size can be reported in 𝒪(logn) time after 𝒪(nlog2n)-time preprocessing. It was unclear, though, how to efficiently report the LIS itself. The recent work of Karthik C. S. and Rahul [27] achieved 𝒪(n1/2log3n+)-time reporting (correct with high probability) after 𝒪(n3/2log3n)-time preprocessing. It is fairly easy to use our witness recovery algorithm to deterministically support 𝒪(log2n)-time reporting after 𝒪(nlog2n)-time preprocessing; see Section 5 for an overview of this result. As further shown in the full version of the paper [21], the reporting time can be improved to 𝒪(logn) and, with the preprocessing time increased to 𝒪(nlog3n), all the way to 𝒪().

Theorem 1.3.

For every parameter α[0,1], there exists an algorithm that, given an integer array s[0..n), in time 𝒪(nlog3αn) builds a data structure that can answer range LIS reporting queries in time 𝒪(logαn), where is the length of the reported sequence.

In particular, there is an algorithm with 𝒪(nlog3n) preprocessing and 𝒪() reporting time and an algorithm with 𝒪(nlog2n) preprocessing and 𝒪(logn) reporting time.

In parallel to this work, Gorbachev and Kociumaka [22] used core-sparse Monge matrix multiplication for the weighted edit distance with integer weights. Theorem 1.2 allowed for saving two logarithmic factors in the final time complexities compared to the initial preprint using Russo’s approach [34]. Weighted edit distance is known to be reducible to unweighted LCS only for a very limited class of so-called uniform weight functions [38], so this application requires the general core-sparse Monge matrix multiplication.

Open Problem.

An interesting open problem is whether any non-trivial trade-off can be achieved for the weighted version of range LIS queries, where each element of s has a weight, and the task is to compute a maximum-weight increasing subsequence of s[i..j): either the weight alone or the whole subsequence. Surprisingly, as we show in the full version of the paper [21], if our bound δ(AB)2(δ(A)+δ(B)) of Theorem 1.1 can be improved to δ(AB)c(δ(A)+δ(B))+~𝒪(p+q+r) for some 1c<2, then our techniques automatically yield a solution with ~𝒪(n1+log2c) preprocessing time, ~𝒪(1) query time, and ~𝒪() reporting time.

2 Preliminaries

Definition 2.1.

A matrix A of size p×q is a Monge matrix if it satisfies the following Monge property for every i[0..p1) and j[0..q1):

Ai,j+Ai+1,j+1Ai,j+1+Ai+1,j.

Furthermore, A is an anti-Monge matrix if the matrix A (with negated entries) is a Monge matrix.

In some sources, the following equivalent (but seemingly stronger) condition is taken as the definition of Monge matrices.

Observation 2.2.

A matrix A of size p×q is a Monge matrix if and only if it satisfies the following inequality for all integers 0ab<p and 0cd<q:

Aa,c+Ab,dAa,d+Ab,c.

Proof.

It suffices to sum the inequality in Definition 2.1 for all i[a..b) and j[c..d).

Definition 2.3.

The min-plus product of a matrix A of size p×q and a matrix B of size q×r is a matrix AB=C of size p×r satisfying Ci,k=minj[0..q)Ai,j+Bj,k for all i[0..p) and k[0..r).

For i[0..p) and k[0..r), we call j[0..q) a witness of Ci,k if and only if Ci,k=Ai,j+Bj,k. We define the p×r witness matrix 𝒲A,B such that 𝒲i,kA,B is the smallest witness of Ci,k for each i[0..p) and k[0..r).

The min-plus product of two Monge matrices is also Monge.

Fact 2.4 ([48, Corollary A]).

Let A,B, and C be matrices such that AB=C. If A and B are Monge, then C is also Monge.

In the context of Monge matrices, it is useful to define the core and the density matrix.

Definition 2.5.

The density matrix of a matrix A of size p×q is a matrix A of size (p1)×(q1) satisfying Ai,j=Ai,j+1+Ai+1,jAi,jAi+1,j+1 for i[0..p1),j[0..q1).

We define the core of the matrix A as

core(A){(i,j,Ai,j)i[0..p1),j[0..q1),Ai,j0}

and denote the core size of the matrix A by δ(A)|core(A)|. Furthermore, we define the core sum of the matrix A as the sum of the values of all core elements of A, that is,

δΣ(A)i[0..p1)j[0..q1)Ai,j.

Note that, for a Monge matrix A, all entries of its density matrix are non-negative, and thus core(A) consists of triples (i,j,v) with some positive values v.

For any matrix A of size p×q and integers 0a<bp and 0c<dq, we write A[a..b)[c..d) to denote the contiguous submatrix of A consisting of all entries on the intersection of rows [a..b) and columns [c..d) of A. Matrices A[a..b][c..d], A(a..b][c..d), etc., are defined analogously.

3 Properties of 𝜹 and 𝜹𝚺

In this section, we provide some useful properties of δ and δΣ. Most importantly, we show how to bound δ(AB) in terms of δ(A) and δ(B).
The following observation is a straightforward consequence of the definitions of δΣ and A.

Observation 3.1.

For a matrix A of size p×q, and integers 0ab<p and 0cd<q,

Aa,c+Ab,d+δΣ(A[a..b][c..d])=Aa,d+Ab,c.

Proof.

By the definition of A, we have Ai,j+Ai+1,j+1+Ai,j=Ai,j+1+Ai+1,j for all i[a..b) and j[c..d). The desired equality follows by summing up all these equalities.

An application of Observation 3.1 for a=c=0 implies that every value of A can be uniquely reconstructed from A’s core and the values in the topmost row and the leftmost column of A.

Definition 3.2.

The condensed representation of a matrix A consists of the core of A as well as the values in the topmost row and the leftmost column of A.

Observation 3.3.

Any submatrix A (not necessarily contiguous) of a Monge matrix A is Monge.

Proof.

By Observation 2.2, if A is Monge, the Monge property holds for every (not necessarily contiguous) 2×2 submatrix of A. In particular, it holds for contiguous 2×2 submatrices of A, and thus A satisfies Definition 2.1.

We next show a crucial property of the witness matrix.

Lemma 3.4 ([48, Theorem 1]).

For any two Monge matrices, A of size p×q and B of size q×r, the witness matrix 𝒲A,B is non-decreasing by rows and columns.

Proof.

We prove that 𝒲A,B is non-decreasing by columns. The claim for rows is analogous. Fix some i[0..p1) and k[0..r). Let j𝒲i,kA,B and j[0..j). As j is the smallest witness for (i,k), we have Ai,j+Bj,k>Ai,j+Bj,k. Using this fact and the Monge property for A, we derive

Ai+1,j+Bj,k (Ai,j+Ai+1,jAi,j)+Bj,k
=(Ai,j+Bj,k)+(Ai+1,jAi,j)
>(Ai,j+Bj,k)+(Ai+1,jAi,j)
=Ai+1,j+Bj,k.

Since Ai+1,j+Bj,k>Ai+1,j+Bj,k for all j[0..j), we conclude that 𝒲i+1,kA,Bj=𝒲i,kA,B holds as required.

We now show how to bound δΣ(AB) in terms of δΣ(A) and δΣ(B).

Lemma 3.5.

Let A be a p×q Monge matrix, and let B be a q×r Monge matrix. Then, δΣ(AB)min{δΣ(A),δΣ(B)}.

Proof.

Let CAB. Denote j𝒲0,0A,B and j𝒲p1,r1A,B, where jj due to Lemma 3.4. We have C0,0=A0,j+Bj,0 and Cp1,r1=Ap1,j+Bj,r1. Furthermore, Cp1,0Ap1,j+Bj,0 and C0,r1A0,j+Bj,r1 due to the definition of C. Hence, due to Observation 3.1, we obtain

δΣ(C) =Cp1,0+C0,r1C0,0Cp1,r1
(Ap1,j+Bj,0)+(A0,j+Bj,r1)(A0,j+Bj,0)(Ap1,j+Bj,r1)
=Ap1,j+A0,jA0,jAp1,j
=δΣ(A[0..p)[j..j])
δΣ(A).

The inequality δΣ(C)δΣ(B) can be obtained using a symmetric argument.

The following corollary says that every core element of AB can be attributed to at least one core element of A and at least one core element of B.

Corollary 3.6.

Let A be a p×q Monge matrix, let B be a q×r Monge matrix, and let CAB. Consider integers i[0..p1) and k[0..r1) such that Ci,k0. There exist integers jA,jB[𝒲i,kA,B..𝒲i+1,k+1A,B) such that Ai,jA0 and BjB,k0.

Proof.

Let j=𝒲i,kA,B and j=𝒲i+1,k+1A,B. By monotonicity of 𝒲A,B (Lemma 3.4), we have j𝒲i,k+1A,B,𝒲i+1,kA,Bj. Thus, C[i..i+1][k..k+1]=A[i..i+1][j..j]B[j..j][k..k+1]. Due to Lemma 3.5, we have 0<Ci,k=δΣ(C[i..i+1][k..k+1])δΣ(A[i..i+1][j..j]). Thus, there exists a core element in A[i..i+1)[j..j).

Symmetrically, 0<Ci,k=δΣ(C[i..i+1][k..k+1])δΣ(B[j..j][k..k+1]), and there exists a core element in B[j..j)[k..k+1).

We now use Corollary 3.6 to derive a bound on δ(AB) in terms of δ(A) and δ(B). The underlying idea is to show that every core element of C is either the first or the last one (in the lexicographic ordering) that the mapping of Corollary 3.6 attributes to the corresponding core element of A or B (see Figure 1 for why the opposite would lead to a contradiction).

Theorem 1.1. [Restated, see original statement.]

Let A be a p×q Monge matrix, and let B be a q×r Monge matrix. We have δ(AB)2(δ(A)+δ(B)).

Figure 1: We consider the matrix of witnesses 𝒲A,B, drawn as an array with rows in [0..p) indexed form top to bottom and columns in [0..r) indexed from left to right. In the cells, we write bounds on the corresponding witnesses. In the grid nodes, we write the two values in [0..q1) derived from Corollary 3.6 for each core element of C. The center of the picture corresponds to a core value Ci,k, attributed to Ai,jA and BjB,k, respectively. For a proof by contradiction, suppose that there are some core elements to the left and right of (i,k) attributed to Ai,jA and some core elements above and below (i,k) attributed to BjB,k. The arrows represent implications (based on Corollaries 3.6 and 3.4), from which a contradiction follows: jA<𝒲i+1,kA,BjB<𝒲i,k+1A,BjA.

Proof.

Let CAB. Define a function fA:core(C)core(A) that maps every core element (i,k,Ci,k)core(C) to (i,j,Ai,j) for the smallest j[𝒲i,kA,B..𝒲i+1,k+1A,B) with Ai,j0; such a j exists due to Corollary 3.6. Analogously, we define a function fB:core(C)core(B) that maps every core element (i,k,Ci,k)core(C) to (j,k,Bj,k) for the smallest j[𝒲i,kA,B..𝒲i+1,k+1A,B) with Bj,k0; again, such a j exists due to Corollary 3.6.

Claim 3.7.

Every core element ccore(C) is the lexicographically minimal or maximal one in its pre-image fA1(fA(c)) under fA or its pre-image fB1(fB(c)) under fB.

Proof.

For a proof by contradiction, pick an element c(i,k,Ci,k)core(C) violating the claim. Let (i,jA,Ai,jA)afA(c) and (jB,k,BjB,k)bfB(c). We make four symmetric arguments (all illustrated in Figure 1), with the first one presented in more detail.

  1. 1.

    Since c is not the minimal core element in fA1(a), we have fA(c)=a for some core element (i,k,Ci,k)ccore(C) that precedes c in the lexicographical order, denoted cc. Due to fA(c)=a=(i,jA,Ai,jA), we must have i=i and jA<𝒲i+1,k+1A,B. Since i=i, then cc implies k+1k. The monotonicity of the witness matrix (Lemma 3.4) thus yields 𝒲i+1,k+1A,B𝒲i+1,kA,B. Overall, we conclude that jA<𝒲i+1,kA,B.

  2. 2.

    Since c is not the maximal core element in fA1(a), we have fA(c)=a for some core element cc. In particular, c=(i,k,Ci,k)core(C) for some k>k. Then, 𝒲i,k+1A,B𝒲i,kA,BjA follows from Lemma 3.4 and fA(c)=a, respectively.

  3. 3.

    Since c is not the minimal core element in fB1(b), we have fB(c)=b for some core element cc. In particular, c=(i,k,Ci,k)core(C) for some i<i. Then, 𝒲i,k+1A,B𝒲i+1,k+1A,B>jB follows from Lemma 3.4 and fB(c)=b, respectively.

  4. 4.

    Since c is not the maximal core element in fB1(b), we have fB(c)=b for some core element cc. In particular, c=(i,k,Ci,k)core(C) for some i>i. Then, 𝒲i+1,kA,B𝒲i,kA,BjB follows from Lemma 3.4 and fB(c)=b, respectively.

Overall, we derive a contradiction: jA<𝒲i+1,kA,BjB<𝒲i,k+1A,BjA. As every core element of C is either the first or the last one in some pre-image under fA or fB, we get that δ(C)2(δ(A)+δ(B)).

Example 3.8.

Note that the inequality for the core size from Theorem 1.1 is weaker than the inequality for the core sum from Lemma 3.5. We claim that this weakening is not an artifact of our proof. Consider the following Monge matrices

A=(0456012301200120),B=(0246002400020000),C=(0246012300000000).

We have C=AB, δ(A)=2, δ(B)=3, and 6=δ(C)>δ(A)+δ(B)>min{δ(A),δ(B)}.

 Remark 3.9.

To the best of our knowledge, Theorem 1.1 shows the first bound on the core size of AB in terms of the core sizes of A and B. Previously, such a bound was only known for unit-Monge matrices [41]. In particular, our bound allows simplifying some time complexities of already existing algorithms. For example, [34, Lemma 18] shows how to compute the product of two explicitly given Monge matrices A and B in time 𝒪(plogq+qlogr+δ(AB)logqlogr+(δ(A)+δ(B))log2q). Assuming that rpoly(q), Theorem 1.1 tells us that we can drop the third summand in the time complexity.

4 Core-Sparse Monge Matrix Min-Plus Multiplication Algorithm

In this section, we present an algorithm that, given the condensed representations of two Monge matrices, computes the condensed representation of their (min,+)-product. Even though the condensed representation itself does not provide efficient random access to the values of the matrix, the following fact justifies our choice of the representation of Monge matrices.

Fact 4.1 ([22, Lemma 4.4]).

There exists an algorithm that, given the condensed representation of a matrix A of size p×q, in time 𝒪(p+q+δ(A)log(1+δ(A))) builds a core-based matrix oracle data structure 𝖢𝖬𝖮(A) that provides 𝒪(log(2+δ(A)))-time random access to the entries of A.

Proof Sketch.

By Observation 3.1 applied for a=c=0, it suffices to answer orthogonal range sum queries on top of core(A). According to [45, Theorem 3], it can be done in 𝒪(δ(A)log(1+δ(A))) preprocessing time and 𝒪(log(2+δ(A))) query time.

Definition 4.2.

Let A be a p×q matrix. For every i[0..p1), denote corei,(A)={(i,j,Ai,j)j[0..q1),Ai,j0} and δi,(A)=|corei,(A)|. Analogously, for every j[0..q1), denote core,j(A)={(i,j,Ai,j)i[0..p1),Ai,j0} and δ,j(A)=|core,j(A)|.

Observation 4.3.

δ(A)=i[0..p1)δi,(A)=j[0..q1)δ,j(A).

We now design another matrix oracle as an alternative to 𝖢𝖬𝖮(A) for the situation in which one wants to query an entry of A that is adjacent to some other already known entry.

Lemma 4.4.

There is an algorithm that, given the condensed representation of a p×q matrix A, in time 𝒪(p+q+δ(A)) builds a local core oracle data structure 𝗅𝖼𝗈(A) with the following interface.

Boundary access:

given indices i[0..p) and j[0..q) such that i=0 or j=0, in time 𝒪(1) returns Ai,j.

Vertically adjacent recomputation:

given indices i[0..p1) and j[0..q), in time 𝒪(δi,(A)+1) returns Ai+1,jAi,j.

Horizontally adjacent recomputation:

given indices i[0..p) and j[0..q1), in time 𝒪(δ,j(A)+1) returns Ai,j+1Ai,j.

Proof.

The local core oracle data structure of A stores all the values of the topmost row and the leftmost column of A, as well as two collections of lists: corei,(A) for all i[0..p1) and core,j(A) for all j[0..q1). The values of the topmost row and the leftmost column of A are already given. The lists corei,(A) and core,j(A) can be computed in 𝒪(p+q+δ(A)) time from core(A). Hence, we can build 𝗅𝖼𝗈(A) in time 𝒪(p+q+δ(A)).

Boundary access can be implemented trivially. We now show how to implement the vertically adjacent recomputation. Suppose that we are given i[0..p1) and j[0..q). Due to Observation 3.1, we have Ai,j+Ai+1,0=Ai+1,j+Ai,0+δΣ(A[i..i+1][0..j]). Note that the values Ai,0 and Ai+1,0 can be computed in constant time using boundary access queries, and δΣ(A[i..i+1][0..j]) can be computed from corei,(A) in time 𝒪(δi,(A)+1). Hence, Ai+1,jAi,j can be computed in time 𝒪(δi,(A)+1).

The horizontally adjacent recomputation is implemented analogously so that Ai,j+1Ai,j can be computed in time 𝒪(δ,j(A)+1).

Note that vertically and horizontally adjacent recomputations allow computing the neighbors of any given entry Ai,j.

Lemma 4.5.

Given the condensed representation of a p×q matrix A, the condensed representation of any contiguous submatrix A of A can be computed in time 𝒪(p+q+δ(A)).

Proof.

Say, A=A[i..i][j..j]. Note that core(A) can be obtained in time 𝒪(δ(A)) by filtering out all elements of core(A) that lie outside of [i..i)×[j..j). It remains to obtain the topmost row and the leftmost column of A. In time 𝒪(p+q+δ(A)) we build 𝗅𝖼𝗈(A) of Lemma 4.4. By starting from Ai,0 and repeatedly applying the horizontally adjacent recomputation, we can compute A[i..i][0..q) (and thus A[i..i][j..j]) in time 𝒪(j[0..q)(δ,j(A)+1))=𝒪(q+δ(A)). The leftmost column of A can be computed in time 𝒪(p+δ(A)) analogously.

We now show a helper lemma that compresses “ultra-sparse” Monge matrices to limit their dimensions by the size of their core.

Lemma 4.6 (Matrix compression).

There are two algorithms compress and decompress with the following properties: The compress algorithm, given the condensed representations of a p×q Monge matrix A and a q×r Monge matrix B, in time 𝒪(p+q+r+δ(A)+δ(B)), builds a p×q Monge matrix A and a q×r Monge matrix B such that pδ(A)+1, qδ(A)+δ(B)+1, rδ(B)+1, δ(A)=δ(A), and δ(B)=δ(B). The decompress algorithm, given the condensed representations of A, B, and AB, where (A,B)=compress(A,B), computes the condensed representation of AB in time 𝒪(p+r+δ(A)+δ(B)).

Proof Sketch.

The compress algorithm deletes all rows of A that do not contain any core elements and all columns of B that do not contain any core elements. This way, pδ(A)+1 and rδ(B)+1 are guaranteed. Furthermore, all core elements of these two matrices are preserved. If there are no core elements between rows i and i+1 of A, the corresponding entries of these two rows differ by some constant c, and thus all the entries of the row i+1 of AB can be obtained from the corresponding entries of row i of AB by adding c. Since we can recover c from A, no information about the answer is “lost” when deleting such rows. An analogous property holds for the removed columns of B. The decompress algorithm reverses the row and column removals performed by the compress algorithm.

The compress algorithm also reduces the number q of columns of A and rows of B to qδ(A)+δ(B)+1. Observe that if there are no core elements between columns j and j+1 of A and between rows j and j+1 of B, then the values Ai,j+Bj,k and Ai,j+1+Bj+1,k differ by a constant independent of i and k. Depending on the sign of this constant difference, we can delete either the j-th column of A and the j-th row of B or the (j+1)-th column of A and the (j+1)-th row of B so that we do not change the min-plus product of these matrices. By repeating this process for all indices j that do not “contribute” to the cores of A and B, we get qδ(A)+δ(B)+1. The formal proof can be found in the full version of the paper [21].

Finally, we show our main Monge matrix multiplication algorithm.

Theorem 1.2. [Restated, see original statement.]

There is a (deterministic) algorithm that, given the condensed representations of a p×q Monge matrix A and a q×r Monge matrix B, in time 𝒪(p+q+r+(δ(A)+δ(B))log(1+δ(A)+δ(B))) computes the condensed representation of AB.333In the technical sections of the paper we use the 𝒪()-notation conservatively. Specifically, we interpret 𝒪(f(x1,,xk)) as the set of functions g(x1,,xk) for which there are constants cg,Ng>0 such that g(x1,,xk)cgf(x1,,xk) holds for all valid tuples (x1,,xk) satisfying maxixiNg. Accordingly, whenever the expression inside 𝒪() depends on multiple parameters, we sometimes add 1 or 2 to the arguments of logarithms to ensure formal correctness in corner cases.

Proof Sketch.

We design a recursive divide-and-conquer procedure Multiply(A,B) (see Algorithm 1 for the details) and solve the problem by initially running Multiply(M1,M2). Given the matrices A and B, we first apply the compress algorithm of Lemma 4.6 to compress A and B into a p×q matrix A and a q×r matrix B respectively. We then compute the condensed representation of CAB and finally use the decompress algorithm of Lemma 4.6 to decompress C into AB.

If p=r=1, we compute the matrix C trivially. Otherwise, we pick a splitting point m(0..q), split the matrix A vertically into the matrix AL of size p×m and the matrix AR of size p×(qm), and split the matrix B horizontally into the matrix BL of size m×r and the matrix BR of size (qm)×r. We pick m in such a way that it splits the cores of A and B almost equally across AL and BL and AR and BR, that is, δ(AL)+δ(BL) and δ(AR)+δ(BR) are at most (δ(A)+δ(B))/2. We recursively compute the condensed representations of the matrices CLALBL and CRARBR. The resulting matrix C can be obtained as the element-wise minimum of CL and CR. Furthermore, one can see that, due to Lemma 3.4, in some top-left region of C, the values are equal to the corresponding values of CL, and in the remaining bottom-right region of C, the values are equal to the corresponding values of CR; see Figure 2. We find the boundary between these two regions by starting in the bottom-left corner of C and traversing it towards the top-right corner along this boundary. We use 𝗅𝖼𝗈(CL) and 𝗅𝖼𝗈(CR) to sequentially compute the subsequent entries along the boundary. Having determined the boundary, we construct the core of C by picking the core elements of CL located to the top-left of this boundary, picking the core elements of CR located to the bottom-right of this boundary, and computing the C values on the boundary from scratch. The values in the topmost row and the leftmost column of C can be trivially obtained as element-wise minima of the corresponding values in CL and CR. This concludes the recursive procedure; see Algorithm 1 for the pseudocode. This algorithm follows the classical divide-and-conquer framework, and thus its time complexity can be easily derived from Theorem 1.1. The full description of the algorithm, the proof of its correctness, and the formal analysis of its time complexity can be found in the full version of the paper [21].

Algorithm 1 The algorithm from Theorem 1.2. Given the condensed representations of the Monge matrices A and B, the algorithm returns the condensed representation of AB.
Figure 2: An example of how C is obtained from CL and CR. The blue ladder represents the border between the values that are inherited from CL and the values that are inherited from CR. Ticks correspond to the core elements: the green ticks are inherited from CL, the red ticks are inherited from CR, and the blue ticks represent the core element computed from scratch.

If the (min,+)-product of matrices represents the lengths of the shortest paths in a graph, the corresponding witness matrix allows computing the underlying shortest paths themselves. For that, we extend the presented algorithm to allow witness reconstruction.

Theorem 4.7.

The algorithm of Theorem 1.2 can be extended so that, within the same time complexity, it also builds a data structure that takes 𝒪(n1+n2+n3+δ(M1)+δ(M2)) space and provides 𝒪(log(2+δ(M1)+δ(M2)))-time oracle access to 𝒲M1,M2.

Proof Sketch.

We slightly modify the algorithm of Theorem 1.2. For the leaf recursive calls with p=r=1, we explicitly store the minimal witness of the only entry of C. In the non-leaf recursive calls, we store the correspondence between the indices in the compressed matrices A,B,C and the decompressed matrices A,B,C, as well as the border separating the values of CL and CR in C (the blue curve of Figure 2). Naively, this data takes space proportional to the time complexity of Theorem 1.2. Nevertheless, using bit-masks equipped with rank/select functionality, the total space complexity can be brought down to be linear in terms of the size of the input. To answer a query, we descend the recursion tree. In constant time we can find an entry of C corresponding to the queried entry of C and decide on which side of the border separating the values of CL and CR the queried entry of C is. After that, we recurse into either ALBL or BLBR. In the terminal recursion call with p=r=1 we return the minimal witness that is stored explicitly. The time complexity of the algorithm is proportional to the depth of recursion. See the full version of the paper [21] for a formal description of the algorithm.

5 Range LIS Queries: Sketch

One of the original applications of Tiskin’s procedure for simple unit-Monge matrix multiplication [38, 41] is an algorithm that preprocesses a given sequence (s[i])i[0..n) of integers in 𝒪(nlog2n) time so that Range Longest Increasing Subsequence (Range LIS) queries can be answered in 𝒪(logn) time. We show an alternative way of obtaining the same result using Theorem 1.2 and avoiding the seaweed braid formalism. The extension of Theorem 4.7 allows recovering the underlying longest increasing subsequence of length in 𝒪(log2n) time. Further 𝒪(nlog3n)-time preprocessing allows for 𝒪()-time recovery, which improves upon the result of [27].

Without loss of generality we assume that (s[i])i[0..n) is a permutation of [0..n).444In what follows, we reserve the word “permutation” for permutations of [0..m) for some m+. We use a popular tool for string similarity problems and interpret range LIS queries as computing the longest path between some pair of vertices of a corresponding alignment graph Gs; see Figure 3. The crucial property of Gs is that it is planar, and thus due to [18, Section 2.3], the matrix Ms of longest distances from the vertices in the bottom row to the vertices of the top row of this graph is anti-Monge.555Note that, in reality, some entries of this matrix are infinite. In the formal description of the algorithm, we augment the alignment graph so that the finite entries of this matrix are preserved and the infinite entries become finite. Thus, the problem of constructing a data structure for range LIS queries is reduced to the problem of computing the condensed representation of Ms.

Figure 3: The alignment graph for s=[1,0,3,2,4], with red edges of weight 0 and green edges of weight 1. The length of the longest path from the i-th vertex in the bottom row to the j-th vertex in the top row of this graph for i<j is equal to 𝖫𝖨𝖲(s[i..j)).

As 𝖫𝖨𝖲(s[i..j))n for all i,j[0..n] with i<j, all entries of Ms are bounded by 𝒪(n), and thus δ(Ms)=𝒪(n) holds due to Observation 3.1 and the fact that δ(MS)δΣ(MS) holds for integer-valued matrices. We compute Ms in a divide-and-conquer fashion. We split the sequence s into subsequences slo and shi containing all values in [0..n2) and [n2..n) respectively, and recursively compute Mslo and Mshi. After that, we note that Gslo and Gshi are essentially compressed versions of the lower half and the upper half of Gs, respectively. Based on this, we transform Mslo in 𝒪(n) time into the matrix of the longest distances from the vertices in the bottom row of Gs to the vertices in the middle row of Gs. Analogously, we transform Mshi into the matrix of the longest distances from the middle row of Gs to the top row of Gs. To obtain Ms, it remains to (max,+)-multiply these two matrices using Theorem 1.2. Every single recursive call takes time 𝒪(nlogn) dominated by the algorithm of Theorem 1.2. The whole divide-and-conquer procedure takes 𝒪(nlog2n) time. Given the condensed representation of Ms, we use Fact 4.1 to create an oracle for 𝒪(logn)-time range LIS queries, thus replicating the result of [41].

Compared to the results of [41], this algorithm operates directly on the local LIS values and thus can be easily converted into an algorithm for the reporting version of range LIS queries, where we want to not only find the length of the longest path in the alignment graph but to also compute the structure of the underlying path itself. For that, we simply use the witness reconstruction oracle of Theorem 4.7 to find the midpoint of the path and descend the recursion tree. This costs 𝒪(logn) time per recursive call and allows reconstructing the entire length- LIS in time 𝒪(log2n). A similar recursive LIS reporting scheme is used in [9].

By treating increasing subsequences of length at most log2n separately, we further obtain an algorithm with 𝒪(nlog3n) preprocessing time and 𝒪() reporting time. It improves the result of [27] with 𝒪(n3/2polylogn)-time preprocessing and 𝒪(+n1/2polylogn)-time reporting.

The formal proofs of the results in this section are given in the full version of the paper [21].

References

  • [1] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter W. Shor, and Robert E. Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica, 2:195–208, 1987. doi:10.1007/BF01840359.
  • [2] Noga Alon, Zvi Galil, and Oded Margalit. On the exponent of the all pairs shortest path problem. Journal of Computer and System Sciences, 54(2):255–262, 1997. doi:10.1006/JCSS.1997.1388.
  • [3] Alberto Apostolico, Mikhail J. Atallah, Lawrence L. Larmore, and Scott McFaddin. Efficient parallel algorithms for string editing and related problems. SIAM Journal on Computing, 19(5):968–988, 1990. doi:10.1137/0219066.
  • [4] Gary Benson. A space efficient algorithm for finding the best nonoverlapping alignment score. Theoretical Computer Science, 145(1&2):357–369, 1995. doi:10.1016/0304-3975(95)92848-R.
  • [5] Glencora Borradaile, Philip N. Klein, Shay Mozes, Yahav Nussbaum, and Christian Wulff-Nilsen. Multiple-source multiple-sink maximum flow in directed planar graphs in near-linear time. SIAM Journal on Computing, 46(4):1280–1303, 2017. doi:10.1137/15M1042929.
  • [6] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM Journal on Computing, 48(2):481–512, 2019. doi:10.1137/17M112720X.
  • [7] Rainer E. Burkard. Monge properties, discrete convexity and applications. European Journal of Operational Research, 176(1):1–14, 2007. doi:10.1016/J.EJOR.2005.04.050.
  • [8] Rainer E. Burkard, Bettina Klinz, and Rüdiger Rudolf. Perspectives of Monge properties in optimization. Discrete Applied Mathematics, 70(2):95–161, 1996. doi:10.1016/0166-218X(95)00103-X.
  • [9] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Nearly optimal parallel algorithms for longest increasing subsequence. In Kunal Agrawal and Julian Shun, editors, 35th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2023, pages 249–259. ACM, 2023. doi:10.1145/3558481.3591078.
  • [10] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM Journal on Computing, 39(5):2075–2089, 2010. doi:10.1137/08071990X.
  • [11] Panagiotis Charalampopoulos, Paweł Gawrychowski, Yaowei Long, Shay Mozes, Seth Pettie, Oren Weimann, and Christian Wulff-Nilsen. Almost optimal exact distance oracles for planar graphs. Journal of the ACM, 70(2):12:1–12:50, 2023. doi:10.1145/3580474.
  • [12] Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. An almost optimal edit distance oracle. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, volume 198 of LIPIcs, pages 48:1–48:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ICALP.2021.48.
  • [13] Panagiotis Charalampopoulos, Tomasz Kociumaka, and Shay Mozes. Dynamic string alignment. In Inge Li Gørtz and Oren Weimann, editors, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 9:1–9:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CPM.2020.9.
  • [14] Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance: A reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, pages 698–707. IEEE, 2022. doi:10.1109/FOCS54457.2022.00072.
  • [15] Shucheng Chi, Ran Duan, and Tianle Xie. Faster algorithms for bounded-difference min-plus product. In Joseph (Seffi) Naor and Niv Buchbinder, editors, 33rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 1435–1447. SIAM, 2022. doi:10.1137/1.9781611977073.60.
  • [16] Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for monotone instances. In Stefano Leonardi and Anupam Gupta, editors, 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pages 1529–1542. ACM, 2022. doi:10.1145/3519935.3520057.
  • [17] Anita Dürr. Improved bounds for rectangular monotone min-plus product and applications. Information Processing Letters, 181:106358, 2023. doi:10.1016/J.IPL.2023.106358.
  • [18] Jittat Fakcharoenphol and Satish Rao. Planar graphs, negative weight edges, shortest paths, and near linear time. Journal of Computer and System Sciences, 72(5):868–889, 2006. doi:10.1016/j.jcss.2005.05.007.
  • [19] Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, and Barna Saha. How compression and approximation affect efficiency in string distance measures. In Joseph (Seffi) Naor and Niv Buchbinder, editors, 33rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 2867–2919. SIAM, 2022. doi:10.1137/1.9781611977073.112.
  • [20] Paweł Gawrychowski. Faster algorithm for computing the edit distance between SLP-compressed strings. In Liliana Calderón-Benavides, Cristina N. González-Caro, Edgar Chávez, and Nivio Ziviani, editors, 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012, volume 7608 of LNCS, pages 229–236. Springer, 2012. doi:10.1007/978-3-642-34109-0_24.
  • [21] Paweł Gawrychowski, Egor Gorbachev, and Tomasz Kociumaka. Core-sparse Monge matrix multiplication: Improved algorithm and applications, 2024. arXiv:2408.04613v2.
  • [22] Egor Gorbachev and Tomasz Kociumaka. Bounded edit distance: Optimal static and dynamic algorithms for small integer weights. In 57th Annual ACM Symposium on Theory of Computing, STOC 2025, pages 2157–2166, New York, NY, USA, 2025. Association for Computing Machinery. doi:10.1145/3717823.3718168.
  • [23] Yuzhou Gu, Adam Polak, Virginia Vassilevska Williams, and Yinzhan Xu. Faster monotone min-plus product, range mode, and single source replacement paths. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, volume 198 of LIPIcs, pages 75:1–75:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ICALP.2021.75.
  • [24] Danny Hermelin, Gad M. Landau, Shir Landau, and Oren Weimann. Unified compression-based acceleration of edit-distance computation. Algorithmica, 65(2):339–353, 2013. doi:10.1007/S00453-011-9590-6.
  • [25] Giuseppe F. Italiano, Adam Karczmarz, Jakub Łącki, and Piotr Sankowski. Decremental single-source reachability in planar digraphs. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1108–1121. ACM, 2017. doi:10.1145/3055399.3055480.
  • [26] Sampath Kannan and Eugene W. Myers. An algorithm for locating nonoverlapping regions of maximum alignment score. SIAM Journal on Computing, 25(3):648–662, 1996. doi:10.1137/S0097539794262677.
  • [27] Karthik C. S. and Saladi Rahul. Range longest increasing subsequence and its relatives: Beating quadratic barrier and approaching optimality, 2024. doi:10.48550/arXiv.2404.04795.
  • [28] Tomasz Kociumaka and Saeed Seddighin. Improved dynamic algorithms for longest increasing subsequence. In Samir Khuller and Virginia Vassilevska Williams, editors, 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 640–653. ACM, 2021. doi:10.1145/3406325.3451026.
  • [29] Jaehyun Koo. On range LIS queries, 2023. Accessed: 2025-04-22. URL: https://codeforces.com/blog/entry/111625.
  • [30] Jaehyun Koo. An optimal MPC algorithm for subunit-Monge matrix multiplication, with applications to LIS. In Kunal Agrawal and Erez Petrank, editors, 36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024, pages 145–154. ACM, 2024. doi:10.1145/3626183.3659974.
  • [31] Peter Krusche and Alexander Tiskin. New algorithms for efficient parallel string comparison. In Friedhelm Meyer auf der Heide and Cynthia A. Phillips, editors, 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pages 209–216. ACM, 2010. doi:10.1145/1810479.1810521.
  • [32] Gad M. Landau and Michal Ziv-Ukelson. On the common substring alignment problem. Journal of Algorithms, 41(2):338–359, 2001. doi:10.1006/JAGM.2001.1191.
  • [33] Nikita Mishin, Daniil Berezun, and Alexander Tiskin. Efficient parallel algorithms for string comparison. In Xian-He Sun, Sameer Shende, Laxmikant V. Kalé, and Yong Chen, editors, 50th International Conference on Parallel Processing, ICPP 2021, pages 50:1–50:10. ACM, 2021. doi:10.1145/3472456.3472489.
  • [34] Luís M. S. Russo. Monge properties of sequence alignment. Theoretical Computer Science, 423:30–49, 2012. doi:10.1016/J.TCS.2011.12.068.
  • [35] Yoshifumi Sakai. A substring-substring LCS data structure. Theoretical Computer Science, 753:16–34, 2019. doi:10.1016/J.TCS.2018.06.034.
  • [36] Yoshifumi Sakai. A data structure for substring-substring LCS length queries. Theoretical Computer Science, 911:41–54, 2022. doi:10.1016/J.TCS.2022.02.004.
  • [37] Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM Journal on Computing, 27(4):972–992, 1998. doi:10.1137/S0097539795288489.
  • [38] Alexander Tiskin. Semi-local string comparison: algorithmic techniques and applications, 2007. arXiv:0707.3619.
  • [39] Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570–581, 2008. doi:10.1016/J.JDA.2008.07.001.
  • [40] Alexander Tiskin. Threshold approximate matching in grammar-compressed strings. In Jan Holub and Jan Zdárek, editors, Prague Stringology Conference 2014, pages 124–138. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2014. URL: http://www.stringology.org/event/2014/p12.html.
  • [41] Alexander Tiskin. Fast distance multiplication of unit-Monge matrices. Algorithmica, 71(4):859–888, 2015. doi:10.1007/S00453-013-9830-Z.
  • [42] Alexandre Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571–603, 2008. doi:10.1007/S11786-007-0033-3.
  • [43] Virginia Vassilevska and Ryan Williams. Finding a maximum weight triangle in n3Δ time, with applications. In Jon M. Kleinberg, editor, 38th Annual ACM Symposium on Theory of Computing, STOC 2006, pages 225–231. ACM, 2006. doi:10.1145/1132516.1132550.
  • [44] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In International Congress of Mathematicians, ICM 2018, pages 3447–3487. World Scientific, 2018. doi:10.1142/9789813272880_0188.
  • [45] Dan E. Willard. New data structures for orthogonal range queries. SIAM Journal on Computing, 14(1):232–253, 1985. doi:10.1137/0214019.
  • [46] R. Ryan Williams. Faster all-pairs shortest paths via circuit complexity. SIAM Journal on Computing, 47(5):1965–1985, 2018. doi:10.1137/15M1024524.
  • [47] Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Shuchi Chawla, editor, 31st ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pages 12–29. SIAM, 2020. doi:10.1137/1.9781611975994.2.
  • [48] F. Frances Yao. Speed-up in dynamic programming. SIAM Journal on Algebraic Discrete Methods, 3(4):532–540, 1982. doi:10.1137/0603055.
  • [49] Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted APSP. In Claire Mathieu, editor, 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, pages 950–957. SIAM, 2009. doi:10.1137/1.9781611973068.103.