Core-Sparse Monge Matrix Multiplication
Abstract
Min-plus matrix multiplication is a fundamental tool for designing algorithms operating on distances in graphs and different problems solvable by dynamic programming. We know that, assuming the APSP hypothesis, no subcubic-time algorithm exists for the case of general matrices. However, in many applications the matrices admit certain structural properties that can be used to design faster algorithms. For example, when considering a planar graph, one often works with a Monge matrix , meaning that the density matrix has non-negative entries, that is, . The min-plus product of two Monge matrices can be computed in time using the famous SMAWK algorithm.
In applications such as longest common subsequence, edit distance, and longest increasing subsequence, the matrices are even more structured, as observed by Tiskin [J. Discrete Algorithms, 2008]: they are (or can be converted to) simple unit-Monge matrices, meaning that the density matrix is a permutation matrix and, furthermore, the first column and the last row of the matrix consist of only zeroes. Such matrices admit an implicit representation of size and, as shown by Tiskin [SODA 2010 & Algorithmica, 2015], their min-plus product can be computed in time. Russo [SPIRE 2010 & Theor. Comput. Sci., 2012] identified a general structural property of matrices that admit such efficient representation and min-plus multiplication algorithms: the core size , defined as the number of non-zero entries in the density matrices of the input and output matrices. He provided an adaptive implementation of the SMAWK algorithm that runs in or time (depending on the representation of the input matrices).
In this work, we further investigate the core size as the parameter that enables efficient min-plus matrix multiplication. On the combinatorial side, we provide a (linear) bound on the core size of the product matrix in terms of the core sizes of the input matrices. On the algorithmic side, we generalize Tiskin’s algorithm (but, arguably, with a more elementary analysis) to solve the core-sparse Monge matrix multiplication problem in time, matching the complexity for simple unit-Monge matrices. As witnessed by the recent work of Gorbachev and Kociumaka [STOC’25] for edit distance with integer weights, our generalization opens up the possibility of speed-ups for weighted sequence alignment problems. Furthermore, our multiplication algorithm is also capable of producing an efficient data structure for recovering the witness for any given entry of the output matrix. This allows us, for example, to preprocess an integer array of size in time so that the longest increasing subsequence of any sub-array can be reconstructed in time, where is the length of the reported subsequence. In comparison, Karthik C. S. and Rahul [arXiv, 2024] recently achieved -time reporting after -time preprocessing.
Keywords and phrases:
Min-plus matrix multiplication, Monge matrix, longest increasing subsequenceFunding:
Egor Gorbachev: This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979).Copyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The min-plus product (also known as the distance product or the tropical product) of two matrices and is defined as a matrix such that . The task of computing the min-plus product of two matrices can be solved in time [46], and it is fine-grained equivalent to the All-Pairs Shortest Path (APSP) problem [44], asking to compute the distances between every pair of vertices in a directed weighted graph on vertices. While it is conjectured that APSP, and hence also the min-plus product, do not admit -time solutions, faster algorithms exist for many special cases arising in numerous applications of the min-plus product; see, e.g., [2, 43, 49, 10, 6, 47, 23, 15, 16, 17]. Most of the underlying procedures rely on fast matrix multiplication for the standard -product.
Monge matrices constitute a notable exception: An matrix is a Monge matrix if its density matrix is non-negative, that is, holds for all . The min-plus product of two Monge matrices can be computed in time using the SMAWK algorithm [1], and the resulting matrix still satisfies the Monge property. Monge matrices arise in many combinatorial optimization problems; see [8, 7] for surveys. One of the most successful applications is for planar graphs, where the distances between vertices on a single face satisfy the Monge property (see [18, Section 2.3]). This observation allowed for an -time111Throughout this paper, we use notation to suppress factors poly-logarithmic in the input size. single-source shortest path algorithm for planar graphs (with negative real weights) [18], and the resulting techniques now belong to the standard toolkit for designing planar graph algorithms; see, e.g., [5, 25, 11].
Another important application of Monge matrices is in sequence alignment problems, such as edit distance and longest common subsequence (LCS), as well as in the related longest increasing subsequence (LIS) problem. Already in the late 1980s, Apostolico, Atallah, Larmore, and McFaddin [3] noted that the so-called DIST matrices, which (among others) specify the weighted edit distances between prefixes of one string and suffixes of another string, satisfy the Monge property. A modern interpretation of this observation is that these matrices store boundary-to-boundary distances in planar alignment graphs [37]; see also [4, 26, 32] for further early applications of DIST matrices and their Monge property.
In the late 2000s, Tiskin [39, 42] observed that the DIST matrices originating from the unweighted variants of edit distance, LCS, and LIS problems are more structured. For this, he introduced the notions of unit-Monge matrices, whose density matrices are permutation matrices (that is, binary matrices whose rows and columns contain exactly a single entry each) and simple Monge matrices, whose leftmost column and bottommost row consist of zeroes. He also proved that the product of two simple unit-Monge matrices still belongs to this class and can be computed in time provided that each matrix is represented using the underlying permutation [41]. By now, the resulting algorithm has found numerous applications, including for computing LCS and edit distance of compressed strings [20, 24, 41, 19], maintaining these similarity measures for dynamic strings [13, 22], approximate pattern matching [40, 14], parallel and distributed algorithms for similarity measures [31, 33], and oracles for substring similarity [35, 12, 36]. Furthermore, Tiskin’s algorithm has been used to solve the LIS problem in various settings, such as dynamic [28], parallel [9], and distributed [30]. A disadvantage of Tiskin’s original description (and even the later informal descriptions [29]) is its dependence on the algebraic structure known as the monoid of seaweed braids, which natively supports unweighted LCS only (tasks involving edit distance need to be reduced to LCS counterparts). This makes the algorithm difficult to generalize to weighted problems and extend even to seemingly simple questions such as recovering (an implicit representation of) the witness indices such that [27].
Russo [34] identified the number of non-zero elements in , , and , called the core size , as the parameter that enables fast min-plus matrix multiplication. It is easy to see that , , and can be stored in space using a condensed representation, consisting of the boundary entries (the leftmost column and bottommost row, e.g., suffice) and core elements, i.e., the non-zero entries of the density matrix. Then, Russo’s algorithm for the core-sparse Monge matrix multiplication problem computes the condensed representation of in time when provided with constant-time random access to and , and in time given condensed representations of and .222More precisely, Russo’s algorithm builds a data structure that provides -time random access to the entries of . Consequently, for repeated multiplication, we cannot assume constant-time access to the input matrices. Hence, is a more realistic bound. Russo’s algorithm has a very different structure than Tiskin’s: it relies on a clever adaptation of the SMAWK algorithm [1] to provide efficient random access to and then employs binary search to find individual non-zero entries of . This brings the question of unifying both approaches and understanding the complexity of core-sparse Monge matrix multiplication.
Our Results.
We consider the core-sparse Monge matrix multiplication problem from the combinatorial and algorithmic points of view, and confirm that the core size is the right parameter that enables fast min-plus matrix multiplication. Let , or the core size of , denote the number of non-zero elements in . We begin with analyzing, in Section 3, how the core size of depends on the core sizes of and .
Theorem 1.1.
Let be a Monge matrix, and let be a Monge matrix. We have .
We stress that this the first bound on in terms of and : the complexity analysis of Russo’s algorithm [34] requires a bound on all , , and .
Next, in Section 4, we generalize Tiskin’s algorithm (but fully avoiding the formalism of the seaweed product) to solve the core-sparse Monge matrix multiplication problem. We believe that the more elementary interpretation makes our viewpoint not only more robust but also easier to understand. At the same time, the extension from simple unit-Monge matrices to general core-sparse Monge matrices introduces a few technical complications handled in the full version of the paper [21]. Notably, we need to keep track of the leftmost column and the bottommost row instead of assuming they are filled with zeroes. Further, the core does not form a permutation so splitting it into two halves of the same size requires some calculations.
Theorem 1.2.
There is a (deterministic) algorithm that, given the condensed representations of a Monge matrix and a Monge matrix , in time computes the condensed representation of .333In the technical sections of the paper we use the -notation conservatively. Specifically, we interpret as the set of functions for which there are constants such that holds for all valid tuples satisfying . Accordingly, whenever the expression inside depends on multiple parameters, we sometimes add or to the arguments of logarithms to ensure formal correctness in corner cases.
The above complexity improves upon Russo’s [34] and matches Tiskin’s [41] for the simple unit-Monge case. Thanks to the more direct description, we can easily extend our algorithm to build (in the same complexity) an -size data structure that, given , in time computes the smallest witness such that .
Applications.
As an application of our witness recovery functionality, we consider the problem of range LIS queries. This task asks to preprocess an integer array so that, later, given two indices , the longest increasing subsequence (LIS) of the sub-array can be reported efficiently. Tiskin [38, 41] showed that the LIS size can be reported in time after -time preprocessing. It was unclear, though, how to efficiently report the LIS itself. The recent work of Karthik C. S. and Rahul [27] achieved -time reporting (correct with high probability) after -time preprocessing. It is fairly easy to use our witness recovery algorithm to deterministically support -time reporting after -time preprocessing; see Section 5 for an overview of this result. As further shown in the full version of the paper [21], the reporting time can be improved to and, with the preprocessing time increased to , all the way to .
Theorem 1.3.
For every parameter , there exists an algorithm that, given an integer array , in time builds a data structure that can answer range LIS reporting queries in time , where is the length of the reported sequence.
In particular, there is an algorithm with preprocessing and reporting time and an algorithm with preprocessing and reporting time.
In parallel to this work, Gorbachev and Kociumaka [22] used core-sparse Monge matrix multiplication for the weighted edit distance with integer weights. Theorem 1.2 allowed for saving two logarithmic factors in the final time complexities compared to the initial preprint using Russo’s approach [34]. Weighted edit distance is known to be reducible to unweighted LCS only for a very limited class of so-called uniform weight functions [38], so this application requires the general core-sparse Monge matrix multiplication.
Open Problem.
An interesting open problem is whether any non-trivial trade-off can be achieved for the weighted version of range LIS queries, where each element of has a weight, and the task is to compute a maximum-weight increasing subsequence of : either the weight alone or the whole subsequence. Surprisingly, as we show in the full version of the paper [21], if our bound of Theorem 1.1 can be improved to for some , then our techniques automatically yield a solution with preprocessing time, query time, and reporting time.
2 Preliminaries
Definition 2.1.
A matrix of size is a Monge matrix if it satisfies the following Monge property for every and :
Furthermore, is an anti-Monge matrix if the matrix (with negated entries) is a Monge matrix.
In some sources, the following equivalent (but seemingly stronger) condition is taken as the definition of Monge matrices.
Observation 2.2.
A matrix of size is a Monge matrix if and only if it satisfies the following inequality for all integers and :
Proof.
It suffices to sum the inequality in Definition 2.1 for all and .
Definition 2.3.
The min-plus product of a matrix of size and a matrix of size is a matrix of size satisfying for all and .
For and , we call a witness of if and only if . We define the witness matrix such that is the smallest witness of for each and .
The min-plus product of two Monge matrices is also Monge.
Fact 2.4 ([48, Corollary A]).
Let , and be matrices such that . If and are Monge, then is also Monge.
In the context of Monge matrices, it is useful to define the core and the density matrix.
Definition 2.5.
The density matrix of a matrix of size is a matrix of size satisfying for .
We define the core of the matrix as
and denote the core size of the matrix by . Furthermore, we define the core sum of the matrix as the sum of the values of all core elements of , that is,
Note that, for a Monge matrix , all entries of its density matrix are non-negative, and thus consists of triples with some positive values .
For any matrix of size and integers and , we write to denote the contiguous submatrix of consisting of all entries on the intersection of rows and columns of . Matrices , , etc., are defined analogously.
3 Properties of and
In this section, we provide some useful properties of and .
Most importantly, we show how to bound in terms of and .
The following observation is a straightforward consequence of the definitions of and .
Observation 3.1.
For a matrix of size , and integers and ,
Proof.
By the definition of , we have for all and . The desired equality follows by summing up all these equalities.
An application of Observation 3.1 for implies that every value of can be uniquely reconstructed from ’s core and the values in the topmost row and the leftmost column of .
Definition 3.2.
The condensed representation of a matrix consists of the core of as well as the values in the topmost row and the leftmost column of .
Observation 3.3.
Any submatrix (not necessarily contiguous) of a Monge matrix is Monge.
Proof.
By Observation 2.2, if is Monge, the Monge property holds for every (not necessarily contiguous) submatrix of . In particular, it holds for contiguous submatrices of , and thus satisfies Definition 2.1.
We next show a crucial property of the witness matrix.
Lemma 3.4 ([48, Theorem 1]).
For any two Monge matrices, of size and of size , the witness matrix is non-decreasing by rows and columns.
Proof.
We prove that is non-decreasing by columns. The claim for rows is analogous. Fix some and . Let and . As is the smallest witness for , we have . Using this fact and the Monge property for , we derive
Since for all , we conclude that holds as required.
We now show how to bound in terms of and .
Lemma 3.5.
Let be a Monge matrix, and let be a Monge matrix. Then, .
Proof.
Let . Denote and , where due to Lemma 3.4. We have and . Furthermore, and due to the definition of . Hence, due to Observation 3.1, we obtain
The inequality can be obtained using a symmetric argument.
The following corollary says that every core element of can be attributed to at least one core element of and at least one core element of .
Corollary 3.6.
Let be a Monge matrix, let be a Monge matrix, and let . Consider integers and such that . There exist integers such that and .
Proof.
Let and . By monotonicity of (Lemma 3.4), we have . Thus, . Due to Lemma 3.5, we have . Thus, there exists a core element in .
Symmetrically, , and there exists a core element in .
We now use Corollary 3.6 to derive a bound on in terms of and . The underlying idea is to show that every core element of is either the first or the last one (in the lexicographic ordering) that the mapping of Corollary 3.6 attributes to the corresponding core element of or (see Figure 1 for why the opposite would lead to a contradiction).
Theorem 1.1. [Restated, see original statement.]
Let be a Monge matrix, and let be a Monge matrix. We have .
Proof.
Let . Define a function that maps every core element to for the smallest with ; such a exists due to Corollary 3.6. Analogously, we define a function that maps every core element to for the smallest with ; again, such a exists due to Corollary 3.6.
Claim 3.7.
Every core element is the lexicographically minimal or maximal one in its pre-image under or its pre-image under .
Proof.
For a proof by contradiction, pick an element violating the claim. Let and . We make four symmetric arguments (all illustrated in Figure 1), with the first one presented in more detail.
-
1.
Since is not the minimal core element in , we have for some core element that precedes in the lexicographical order, denoted . Due to , we must have and . Since , then implies . The monotonicity of the witness matrix (Lemma 3.4) thus yields . Overall, we conclude that .
-
2.
Since is not the maximal core element in , we have for some core element . In particular, for some . Then, follows from Lemma 3.4 and , respectively.
-
3.
Since is not the minimal core element in , we have for some core element . In particular, for some . Then, follows from Lemma 3.4 and , respectively.
-
4.
Since is not the maximal core element in , we have for some core element . In particular, for some . Then, follows from Lemma 3.4 and , respectively.
Overall, we derive a contradiction: . As every core element of is either the first or the last one in some pre-image under or , we get that .
Example 3.8.
Note that the inequality for the core size from Theorem 1.1 is weaker than the inequality for the core sum from Lemma 3.5. We claim that this weakening is not an artifact of our proof. Consider the following Monge matrices
We have , , , and .
Remark 3.9.
To the best of our knowledge, Theorem 1.1 shows the first bound on the core size of in terms of the core sizes of and . Previously, such a bound was only known for unit-Monge matrices [41]. In particular, our bound allows simplifying some time complexities of already existing algorithms. For example, [34, Lemma 18] shows how to compute the product of two explicitly given Monge matrices and in time . Assuming that , Theorem 1.1 tells us that we can drop the third summand in the time complexity.
4 Core-Sparse Monge Matrix Min-Plus Multiplication Algorithm
In this section, we present an algorithm that, given the condensed representations of two Monge matrices, computes the condensed representation of their -product. Even though the condensed representation itself does not provide efficient random access to the values of the matrix, the following fact justifies our choice of the representation of Monge matrices.
Fact 4.1 ([22, Lemma 4.4]).
There exists an algorithm that, given the condensed representation of a matrix of size , in time builds a core-based matrix oracle data structure that provides -time random access to the entries of .
Proof Sketch.
By Observation 3.1 applied for , it suffices to answer orthogonal range sum queries on top of . According to [45, Theorem 3], it can be done in preprocessing time and query time.
Definition 4.2.
Let be a matrix. For every , denote and . Analogously, for every , denote and .
Observation 4.3.
.
We now design another matrix oracle as an alternative to for the situation in which one wants to query an entry of that is adjacent to some other already known entry.
Lemma 4.4.
There is an algorithm that, given the condensed representation of a matrix , in time builds a local core oracle data structure with the following interface.
- Boundary access:
-
given indices and such that or , in time returns .
- Vertically adjacent recomputation:
-
given indices and , in time returns .
- Horizontally adjacent recomputation:
-
given indices and , in time returns .
Proof.
The local core oracle data structure of stores all the values of the topmost row and the leftmost column of , as well as two collections of lists: for all and for all . The values of the topmost row and the leftmost column of are already given. The lists and can be computed in time from . Hence, we can build in time .
Boundary access can be implemented trivially. We now show how to implement the vertically adjacent recomputation. Suppose that we are given and . Due to Observation 3.1, we have . Note that the values and can be computed in constant time using boundary access queries, and can be computed from in time . Hence, can be computed in time .
The horizontally adjacent recomputation is implemented analogously so that can be computed in time .
Note that vertically and horizontally adjacent recomputations allow computing the neighbors of any given entry .
Lemma 4.5.
Given the condensed representation of a matrix , the condensed representation of any contiguous submatrix of can be computed in time .
Proof.
Say, . Note that can be obtained in time by filtering out all elements of that lie outside of . It remains to obtain the topmost row and the leftmost column of . In time we build of Lemma 4.4. By starting from and repeatedly applying the horizontally adjacent recomputation, we can compute (and thus ) in time . The leftmost column of can be computed in time analogously.
We now show a helper lemma that compresses “ultra-sparse” Monge matrices to limit their dimensions by the size of their core.
Lemma 4.6 (Matrix compression).
There are two algorithms compress and decompress with the following properties: The compress algorithm, given the condensed representations of a Monge matrix and a Monge matrix , in time , builds a Monge matrix and a Monge matrix such that , , , , and . The decompress algorithm, given the condensed representations of , , and , where , computes the condensed representation of in time .
Proof Sketch.
The compress algorithm deletes all rows of that do not contain any core elements and all columns of that do not contain any core elements. This way, and are guaranteed. Furthermore, all core elements of these two matrices are preserved. If there are no core elements between rows and of , the corresponding entries of these two rows differ by some constant , and thus all the entries of the row of can be obtained from the corresponding entries of row of by adding . Since we can recover from , no information about the answer is “lost” when deleting such rows. An analogous property holds for the removed columns of . The decompress algorithm reverses the row and column removals performed by the compress algorithm.
The compress algorithm also reduces the number of columns of and rows of to . Observe that if there are no core elements between columns and of and between rows and of , then the values and differ by a constant independent of and . Depending on the sign of this constant difference, we can delete either the -th column of and the -th row of or the -th column of and the -th row of so that we do not change the min-plus product of these matrices. By repeating this process for all indices that do not “contribute” to the cores of and , we get . The formal proof can be found in the full version of the paper [21].
Finally, we show our main Monge matrix multiplication algorithm.
Theorem 1.2. [Restated, see original statement.]
There is a (deterministic) algorithm that, given the condensed representations of a Monge matrix and a Monge matrix , in time computes the condensed representation of .333In the technical sections of the paper we use the -notation conservatively. Specifically, we interpret as the set of functions for which there are constants such that holds for all valid tuples satisfying . Accordingly, whenever the expression inside depends on multiple parameters, we sometimes add or to the arguments of logarithms to ensure formal correctness in corner cases.
Proof Sketch.
We design a recursive divide-and-conquer procedure (see Algorithm 1 for the details) and solve the problem by initially running . Given the matrices and , we first apply the compress algorithm of Lemma 4.6 to compress and into a matrix and a matrix respectively. We then compute the condensed representation of and finally use the decompress algorithm of Lemma 4.6 to decompress into .
If , we compute the matrix trivially. Otherwise, we pick a splitting point , split the matrix vertically into the matrix of size and the matrix of size , and split the matrix horizontally into the matrix of size and the matrix of size . We pick in such a way that it splits the cores of and almost equally across and and and , that is, and are at most . We recursively compute the condensed representations of the matrices and . The resulting matrix can be obtained as the element-wise minimum of and . Furthermore, one can see that, due to Lemma 3.4, in some top-left region of , the values are equal to the corresponding values of , and in the remaining bottom-right region of , the values are equal to the corresponding values of ; see Figure 2. We find the boundary between these two regions by starting in the bottom-left corner of and traversing it towards the top-right corner along this boundary. We use and to sequentially compute the subsequent entries along the boundary. Having determined the boundary, we construct the core of by picking the core elements of located to the top-left of this boundary, picking the core elements of located to the bottom-right of this boundary, and computing the values on the boundary from scratch. The values in the topmost row and the leftmost column of can be trivially obtained as element-wise minima of the corresponding values in and . This concludes the recursive procedure; see Algorithm 1 for the pseudocode. This algorithm follows the classical divide-and-conquer framework, and thus its time complexity can be easily derived from Theorem 1.1. The full description of the algorithm, the proof of its correctness, and the formal analysis of its time complexity can be found in the full version of the paper [21].
If the -product of matrices represents the lengths of the shortest paths in a graph, the corresponding witness matrix allows computing the underlying shortest paths themselves. For that, we extend the presented algorithm to allow witness reconstruction.
Theorem 4.7.
The algorithm of Theorem 1.2 can be extended so that, within the same time complexity, it also builds a data structure that takes space and provides -time oracle access to .
Proof Sketch.
We slightly modify the algorithm of Theorem 1.2. For the leaf recursive calls with , we explicitly store the minimal witness of the only entry of . In the non-leaf recursive calls, we store the correspondence between the indices in the compressed matrices and the decompressed matrices , as well as the border separating the values of and in (the blue curve of Figure 2). Naively, this data takes space proportional to the time complexity of Theorem 1.2. Nevertheless, using bit-masks equipped with rank/select functionality, the total space complexity can be brought down to be linear in terms of the size of the input. To answer a query, we descend the recursion tree. In constant time we can find an entry of corresponding to the queried entry of and decide on which side of the border separating the values of and the queried entry of is. After that, we recurse into either or . In the terminal recursion call with we return the minimal witness that is stored explicitly. The time complexity of the algorithm is proportional to the depth of recursion. See the full version of the paper [21] for a formal description of the algorithm.
5 Range LIS Queries: Sketch
One of the original applications of Tiskin’s procedure for simple unit-Monge matrix multiplication [38, 41] is an algorithm that preprocesses a given sequence of integers in time so that Range Longest Increasing Subsequence (Range LIS) queries can be answered in time. We show an alternative way of obtaining the same result using Theorem 1.2 and avoiding the seaweed braid formalism. The extension of Theorem 4.7 allows recovering the underlying longest increasing subsequence of length in time. Further -time preprocessing allows for -time recovery, which improves upon the result of [27].
Without loss of generality we assume that is a permutation of .444In what follows, we reserve the word “permutation” for permutations of for some . We use a popular tool for string similarity problems and interpret range LIS queries as computing the longest path between some pair of vertices of a corresponding alignment graph ; see Figure 3. The crucial property of is that it is planar, and thus due to [18, Section 2.3], the matrix of longest distances from the vertices in the bottom row to the vertices of the top row of this graph is anti-Monge.555Note that, in reality, some entries of this matrix are infinite. In the formal description of the algorithm, we augment the alignment graph so that the finite entries of this matrix are preserved and the infinite entries become finite. Thus, the problem of constructing a data structure for range LIS queries is reduced to the problem of computing the condensed representation of .
As for all with , all entries of are bounded by , and thus holds due to Observation 3.1 and the fact that holds for integer-valued matrices. We compute in a divide-and-conquer fashion. We split the sequence into subsequences and containing all values in and respectively, and recursively compute and . After that, we note that and are essentially compressed versions of the lower half and the upper half of , respectively. Based on this, we transform in time into the matrix of the longest distances from the vertices in the bottom row of to the vertices in the middle row of . Analogously, we transform into the matrix of the longest distances from the middle row of to the top row of . To obtain , it remains to -multiply these two matrices using Theorem 1.2. Every single recursive call takes time dominated by the algorithm of Theorem 1.2. The whole divide-and-conquer procedure takes time. Given the condensed representation of , we use Fact 4.1 to create an oracle for -time range LIS queries, thus replicating the result of [41].
Compared to the results of [41], this algorithm operates directly on the local LIS values and thus can be easily converted into an algorithm for the reporting version of range LIS queries, where we want to not only find the length of the longest path in the alignment graph but to also compute the structure of the underlying path itself. For that, we simply use the witness reconstruction oracle of Theorem 4.7 to find the midpoint of the path and descend the recursion tree. This costs time per recursive call and allows reconstructing the entire length- LIS in time . A similar recursive LIS reporting scheme is used in [9].
By treating increasing subsequences of length at most separately, we further obtain an algorithm with preprocessing time and reporting time. It improves the result of [27] with -time preprocessing and -time reporting.
The formal proofs of the results in this section are given in the full version of the paper [21].
References
- [1] Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter W. Shor, and Robert E. Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica, 2:195–208, 1987. doi:10.1007/BF01840359.
- [2] Noga Alon, Zvi Galil, and Oded Margalit. On the exponent of the all pairs shortest path problem. Journal of Computer and System Sciences, 54(2):255–262, 1997. doi:10.1006/JCSS.1997.1388.
- [3] Alberto Apostolico, Mikhail J. Atallah, Lawrence L. Larmore, and Scott McFaddin. Efficient parallel algorithms for string editing and related problems. SIAM Journal on Computing, 19(5):968–988, 1990. doi:10.1137/0219066.
- [4] Gary Benson. A space efficient algorithm for finding the best nonoverlapping alignment score. Theoretical Computer Science, 145(1&2):357–369, 1995. doi:10.1016/0304-3975(95)92848-R.
- [5] Glencora Borradaile, Philip N. Klein, Shay Mozes, Yahav Nussbaum, and Christian Wulff-Nilsen. Multiple-source multiple-sink maximum flow in directed planar graphs in near-linear time. SIAM Journal on Computing, 46(4):1280–1303, 2017. doi:10.1137/15M1042929.
- [6] Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM Journal on Computing, 48(2):481–512, 2019. doi:10.1137/17M112720X.
- [7] Rainer E. Burkard. Monge properties, discrete convexity and applications. European Journal of Operational Research, 176(1):1–14, 2007. doi:10.1016/J.EJOR.2005.04.050.
- [8] Rainer E. Burkard, Bettina Klinz, and Rüdiger Rudolf. Perspectives of Monge properties in optimization. Discrete Applied Mathematics, 70(2):95–161, 1996. doi:10.1016/0166-218X(95)00103-X.
- [9] Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Nearly optimal parallel algorithms for longest increasing subsequence. In Kunal Agrawal and Julian Shun, editors, 35th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2023, pages 249–259. ACM, 2023. doi:10.1145/3558481.3591078.
- [10] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM Journal on Computing, 39(5):2075–2089, 2010. doi:10.1137/08071990X.
- [11] Panagiotis Charalampopoulos, Paweł Gawrychowski, Yaowei Long, Shay Mozes, Seth Pettie, Oren Weimann, and Christian Wulff-Nilsen. Almost optimal exact distance oracles for planar graphs. Journal of the ACM, 70(2):12:1–12:50, 2023. doi:10.1145/3580474.
- [12] Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. An almost optimal edit distance oracle. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, volume 198 of LIPIcs, pages 48:1–48:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ICALP.2021.48.
- [13] Panagiotis Charalampopoulos, Tomasz Kociumaka, and Shay Mozes. Dynamic string alignment. In Inge Li Gørtz and Oren Weimann, editors, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pages 9:1–9:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CPM.2020.9.
- [14] Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance: A reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, pages 698–707. IEEE, 2022. doi:10.1109/FOCS54457.2022.00072.
- [15] Shucheng Chi, Ran Duan, and Tianle Xie. Faster algorithms for bounded-difference min-plus product. In Joseph (Seffi) Naor and Niv Buchbinder, editors, 33rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 1435–1447. SIAM, 2022. doi:10.1137/1.9781611977073.60.
- [16] Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for monotone instances. In Stefano Leonardi and Anupam Gupta, editors, 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, pages 1529–1542. ACM, 2022. doi:10.1145/3519935.3520057.
- [17] Anita Dürr. Improved bounds for rectangular monotone min-plus product and applications. Information Processing Letters, 181:106358, 2023. doi:10.1016/J.IPL.2023.106358.
- [18] Jittat Fakcharoenphol and Satish Rao. Planar graphs, negative weight edges, shortest paths, and near linear time. Journal of Computer and System Sciences, 72(5):868–889, 2006. doi:10.1016/j.jcss.2005.05.007.
- [19] Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, and Barna Saha. How compression and approximation affect efficiency in string distance measures. In Joseph (Seffi) Naor and Niv Buchbinder, editors, 33rd ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 2867–2919. SIAM, 2022. doi:10.1137/1.9781611977073.112.
- [20] Paweł Gawrychowski. Faster algorithm for computing the edit distance between SLP-compressed strings. In Liliana Calderón-Benavides, Cristina N. González-Caro, Edgar Chávez, and Nivio Ziviani, editors, 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012, volume 7608 of LNCS, pages 229–236. Springer, 2012. doi:10.1007/978-3-642-34109-0_24.
- [21] Paweł Gawrychowski, Egor Gorbachev, and Tomasz Kociumaka. Core-sparse Monge matrix multiplication: Improved algorithm and applications, 2024. arXiv:2408.04613v2.
- [22] Egor Gorbachev and Tomasz Kociumaka. Bounded edit distance: Optimal static and dynamic algorithms for small integer weights. In 57th Annual ACM Symposium on Theory of Computing, STOC 2025, pages 2157–2166, New York, NY, USA, 2025. Association for Computing Machinery. doi:10.1145/3717823.3718168.
- [23] Yuzhou Gu, Adam Polak, Virginia Vassilevska Williams, and Yinzhan Xu. Faster monotone min-plus product, range mode, and single source replacement paths. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, volume 198 of LIPIcs, pages 75:1–75:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ICALP.2021.75.
- [24] Danny Hermelin, Gad M. Landau, Shir Landau, and Oren Weimann. Unified compression-based acceleration of edit-distance computation. Algorithmica, 65(2):339–353, 2013. doi:10.1007/S00453-011-9590-6.
- [25] Giuseppe F. Italiano, Adam Karczmarz, Jakub Łącki, and Piotr Sankowski. Decremental single-source reachability in planar digraphs. In Hamed Hatami, Pierre McKenzie, and Valerie King, editors, 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1108–1121. ACM, 2017. doi:10.1145/3055399.3055480.
- [26] Sampath Kannan and Eugene W. Myers. An algorithm for locating nonoverlapping regions of maximum alignment score. SIAM Journal on Computing, 25(3):648–662, 1996. doi:10.1137/S0097539794262677.
- [27] Karthik C. S. and Saladi Rahul. Range longest increasing subsequence and its relatives: Beating quadratic barrier and approaching optimality, 2024. doi:10.48550/arXiv.2404.04795.
- [28] Tomasz Kociumaka and Saeed Seddighin. Improved dynamic algorithms for longest increasing subsequence. In Samir Khuller and Virginia Vassilevska Williams, editors, 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 640–653. ACM, 2021. doi:10.1145/3406325.3451026.
- [29] Jaehyun Koo. On range LIS queries, 2023. Accessed: 2025-04-22. URL: https://codeforces.com/blog/entry/111625.
- [30] Jaehyun Koo. An optimal MPC algorithm for subunit-Monge matrix multiplication, with applications to LIS. In Kunal Agrawal and Erez Petrank, editors, 36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024, pages 145–154. ACM, 2024. doi:10.1145/3626183.3659974.
- [31] Peter Krusche and Alexander Tiskin. New algorithms for efficient parallel string comparison. In Friedhelm Meyer auf der Heide and Cynthia A. Phillips, editors, 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pages 209–216. ACM, 2010. doi:10.1145/1810479.1810521.
- [32] Gad M. Landau and Michal Ziv-Ukelson. On the common substring alignment problem. Journal of Algorithms, 41(2):338–359, 2001. doi:10.1006/JAGM.2001.1191.
- [33] Nikita Mishin, Daniil Berezun, and Alexander Tiskin. Efficient parallel algorithms for string comparison. In Xian-He Sun, Sameer Shende, Laxmikant V. Kalé, and Yong Chen, editors, 50th International Conference on Parallel Processing, ICPP 2021, pages 50:1–50:10. ACM, 2021. doi:10.1145/3472456.3472489.
- [34] Luís M. S. Russo. Monge properties of sequence alignment. Theoretical Computer Science, 423:30–49, 2012. doi:10.1016/J.TCS.2011.12.068.
- [35] Yoshifumi Sakai. A substring-substring LCS data structure. Theoretical Computer Science, 753:16–34, 2019. doi:10.1016/J.TCS.2018.06.034.
- [36] Yoshifumi Sakai. A data structure for substring-substring LCS length queries. Theoretical Computer Science, 911:41–54, 2022. doi:10.1016/J.TCS.2022.02.004.
- [37] Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM Journal on Computing, 27(4):972–992, 1998. doi:10.1137/S0097539795288489.
- [38] Alexander Tiskin. Semi-local string comparison: algorithmic techniques and applications, 2007. arXiv:0707.3619.
- [39] Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570–581, 2008. doi:10.1016/J.JDA.2008.07.001.
- [40] Alexander Tiskin. Threshold approximate matching in grammar-compressed strings. In Jan Holub and Jan Zdárek, editors, Prague Stringology Conference 2014, pages 124–138. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2014. URL: http://www.stringology.org/event/2014/p12.html.
- [41] Alexander Tiskin. Fast distance multiplication of unit-Monge matrices. Algorithmica, 71(4):859–888, 2015. doi:10.1007/S00453-013-9830-Z.
- [42] Alexandre Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571–603, 2008. doi:10.1007/S11786-007-0033-3.
- [43] Virginia Vassilevska and Ryan Williams. Finding a maximum weight triangle in time, with applications. In Jon M. Kleinberg, editor, 38th Annual ACM Symposium on Theory of Computing, STOC 2006, pages 225–231. ACM, 2006. doi:10.1145/1132516.1132550.
- [44] Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In International Congress of Mathematicians, ICM 2018, pages 3447–3487. World Scientific, 2018. doi:10.1142/9789813272880_0188.
- [45] Dan E. Willard. New data structures for orthogonal range queries. SIAM Journal on Computing, 14(1):232–253, 1985. doi:10.1137/0214019.
- [46] R. Ryan Williams. Faster all-pairs shortest paths via circuit complexity. SIAM Journal on Computing, 47(5):1965–1985, 2018. doi:10.1137/15M1024524.
- [47] Virginia Vassilevska Williams and Yinzhan Xu. Truly subcubic min-plus product for less structured matrices, with applications. In Shuchi Chawla, editor, 31st ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, pages 12–29. SIAM, 2020. doi:10.1137/1.9781611975994.2.
- [48] F. Frances Yao. Speed-up in dynamic programming. SIAM Journal on Algebraic Discrete Methods, 3(4):532–540, 1982. doi:10.1137/0603055.
- [49] Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted APSP. In Claire Mathieu, editor, 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, pages 950–957. SIAM, 2009. doi:10.1137/1.9781611973068.103.
