Fast Gaussian Elimination for Low Treewidth Matrices

Fürer, Martin; Hoppen, Carlos; Trevisan, Vilmar

doi:10.4230/LIPIcs.ESA.2025.116

Fast Gaussian Elimination for Low Treewidth Matrices

Martin Fürer

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA Carlos Hoppen

Instituto de Matemática e Estatística, Universidade Federal do Rio Grande do Sul, Brazil Vilmar Trevisan

Instituto de Matemática e Estatística, Universidade Federal do Rio Grande do Sul, Brazil

Abstract

Let $A=(a_{ij})$ be an $m\times n$ matrix whose elements lie in an arbitrary field $\mathbb{F}$ , and let $G$ be the bipartite graph with vertex set $\{v_{1},\ldots,v_{m}\}\cup\{w_{1},\ldots,w_{n}\}$ such that vertices $v_{i}$ and $w_{j}$ are adjacent if and only if $a_{ij}\neq 0$ . We introduce an algorithm that finds an $m\times n$ matrix $U$ in row echelon form and a permutation matrix $Q$ of order $n$ , such that $A Q$ is row equivalent to $U$ . If a tree decomposition $\mathcal{T}$ of $G$ of width $k$ and size $O(k(m+n))$ is part of the input, then $Q$ and the columns of $U$ that contain a pivot can be computed in time $O(k^{2}(m+n))$ . Among other things, this allows us to compute the rank and the determinant of $A$ in time $O(k^{2}(m+n))$ . It also allows us to decide in time $O(k^{2}(m+n))$ whether the linear system $Ax=b$ has a solution and to compute a solution of the linear system in case it exists.

Keywords and phrases:

Gaussian elimination, FPT algorithms, treewidth

Funding:

Carlos Hoppen: C. Hoppen acknowledges the support of CNPq 315132/2021-3.

Vilmar Trevisan: V. Trevisan acknowledges partial support of CNPq grants 409746/2016-9 and 303334/2016-9, CAPES under project MATHAMSUD 18-MATH-01 and FAPERGS under Project PqG 17/2551-0001. CNPq is Conselho Nacional de Desenvolvimento Científico e Tecnológico, CAPES is Coordenação de Aperfeiçoamento de Pessoal de Nível Superior and FAPERGS is Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Design and analysis of algorithms

DOI:

10.4230/LIPIcs.ESA.2025.116

Event:

33rd Annual European Symposium on Algorithms (ESA 2025)

Editors:

Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Reducing a matrix to a desired form or decomposing a matrix into a product of matrices with a given structure are basic operations in linear algebra. Problems involving triangular matrices and matrices in row echelon form are often easy to solve, and a large number of tasks may be accomplished by first reducing a general matrix into a matrix of this form. To be precise, a matrix $A=(a_{ij})$ is in row echelon form if all rows consisting entirely of zeros are at the bottom, and the following holds for the remaining rows. If $i$ is a nonzero row, its pivot is the element $a_{ij}\neq 0$ with least $j$ . If rows $i_{1}<i_{2}$ have pivots in columns $j_{1}$ and $j_{2}$ , respectively, then $j_{1}<j_{2}$ .

A basic result in linear algebra states that every matrix can be transformed into a matrix in row echelon form by a sequence of three types of row operations, known as elementary row operations. In this paper, we shall refer to two of these types: a type I elementary row operation on a matrix involves interchanging two rows. A type II elementary row operation involves adding a multiple of a row to another row. We say that $A$ and $B$ are row equivalent if $A$ can be transformed into $B$ by a sequence of elementary row operations. The process of reducing a matrix to row echelon form by elementary row operations is known as Gaussian elimination. The standard school algorithm reduces an arbitrary $m\times n$ input matrix $A$ with entries in a field $\mathbb{F}$ to row echelon form in $O(mn\min\{m,n\})$ field operations. However, it can be faster if more is known about the structure of $A$ . For instance, if $A$ is sparse, its structure can be exploited if we find a pivoting scheme (also known as elimination ordering) that minimizes the fill-in, defined as the set of matrix positions that were initially 0, but became nonzero at some point during the computation. It is worth noting that sparse matrices often occur for some structural reason, such as small treewidth. Gaussian elimination cannot be used efficiently on random sparse matrices with any elimination order, as we would expect such matrices to quickly get dense as the algorithm progresses. We should mention that other strategies can be used to exploit the sparsity of the input matrix, see for instance the work of Peng and Vempala [12].

Much of the early focus has been on the question of characterizing the graphs corresponding to matrices with a perfect elimination order, meaning that there is a pivot strategy for Gaussian elimination without any fill-in. Parter [11] has assigned a graph to a matrix $A$ with an edge $\{i,j\}$ if $a_{ij}$ or $a_{ji}$ are non-zero. He has then shown that there is a perfect elimination order if the graph is a tree. As became customary afterwards, he chooses all pivots in the diagonal and makes the strong assumption that all diagonal elements of the matrix are non-zero and remain so during the computation, usually referred to as not having any accidental cancellation.

Obviously, this assumption simplifies Gaussian elimination enormously. It is justified because it is true for symmetric positive definite matrices, which are the most important special class of matrices for systems of linear equations. However, one would like to see more widely applicable methods.

Rose [16] shows that Parter’s result can be extended to all triangulated graphs (also known as chordal graphs). Rose, Tarjan, and Lueker [18] show that a graph has a perfect elimination order if and only if the graph is triangulated. In this case, they find a perfect elimination order in $O(|V|+|E|)$ time. Rose and Tarjan [17] efficiently find a perfect elimination order in a directed graph if one exists, and show that finding a minimal elimination ordering (minimizing the fill-in) is NP-hard. A directed graph describes the non-zero off-diagonal entries in an $n\times n$ matrix, where Gaussian elimination is done by choosing pivots in the diagonal under the assumption that there will never be a 0 in the diagonal.

The perfect elimination question is more complicated for the case of asymmetric $m\times n$ matrices $A$ , where any nonzero pivot is allowed. Any $m\times n$ matrix $A=(a_{ij})$ may be naturally associated with a bipartite graph $G$ with vertex set $\{v_{1},\ldots,v_{m}\}\cup\{w_{1},\ldots,w_{n}\}$ such that vertices $v_{i}$ and $w_{j}$ are adjacent if and only if $a_{ij}\neq 0$ . We say that $G$ is the underlying bipartite graph of $A$ . This allows us to employ structural decompositions of graph theory to deal with the nonzero entries of $A$ in an efficient way. Different from the symmetric case with pivots restricted to the diagonal, there is no nice characterization known for the bipartite graphs with perfect elimination order when arbitrary pivots are allowed. Nevertheless, a very large class is given by the chordal bipartite graphs defined by Golumbic and Goss [7]. These graphs are not chordal, as obviously only forests are bipartite and chordal. Still, chordal bipartite graphs are characterized similarly to chordal graphs. For chordal graphs, the defining property is that every cycle of length at least 4 has a chord, while chordal bipartite graphs are defined by the requirement that every cycle of length at least 6 has a chord. Golumbic and Goss [7] have shown that chordal bipartite graphs have a perfect elimination order.

They are not the only bipartite graphs with a perfect elimination order, but the only ones where one can always pick any simplicial edge (definition follows) as a pivot [7]. An edge $\{u,v\}$ is simplicial if the union of the neighborhoods of $u$ and $v$ induces a complete bipartite graph. It is easy to see that a bipartite graph $G$ has a perfect elimination order if there exists an ordering of a matching $\{(u_{1},v_{1}),(u_{2},v_{2}),\dots,(u_{p},v_{p})\}$ in $G$ such that $(u_{i},v_{i})$ is simplicial in the subgraph $G[V\setminus\{u_{1},v_{1},u_{2},v_{2},u_{i-1},v_{i-1}\}]$ for $i\in\{1,\dots,p\}$ .

Radhakrishnan, Hunt, and Stearns [13] do Gaussian elimination for symmetric matrices with a given tree decomposition of width $k$ in time $O(k^{2}n)$ . As has been common, they implicitly assume that the diagonal is non-zero and there is no accidental cancellation. Naturally, given today’s familiarity with algorithms based on tree decompositions, it would now be an easy exercise to design such an algorithm when pivots can always be selected in the diagonal.

Only in recent years has there finally been some progress towards the difficult problem of efficient Gaussian elimination for matrices of small treewidth without the simplifying assumption that pivots can always be chosen in the diagonal. Fomin et al. [5] have shown a Gaussian elimination algorithm running in time $O(p^{2}n)$ where $p$ is the lesser known tree-partition width parameter or the pathwidth. The tree-partition width of a graph can be arbitrarily large already for graphs of constant treewidth, unless the maximum degree is bounded [19]. However, Fomin et al. [5] still obtained an $O(k^{3}n)$ algorithm for treewidth $k$ . We should also refer to the work of Dong, Lee, and Ye [4], whose main result implies an $O(k^{2}n)$ algorithm for the solution of a linear system $Ax=b$ such that the matrix $A\in\mathbb{R}^{m\times n}$ is full-rank (actually, their result is about the solution of a linear program).

Naturally, the longstanding main goal is an $O(k^{2}n)$ Gaussian elimination algorithm for treewidth $k$ without restrictive assumptions. This would provide a smooth transition to the school algorithm when $k$ approaches $n$ . It would also achieve the same running time bound for the general case as for the easy case of non-zero diagonal and no accidental cancellation. It actually seems that the $O(k^{2}n)$ goal has not been explicitly formulated until recently [5], maybe because it had not seemed achievable. However, when this goal has been obtained by simple algorithms under very strong assumptions, the question whether these assumptions (that convenient entries are nonzero and thus can be used as pivots) are necessary must have been obvious.

We make substantial progress in this direction by devising the algorithm Fast Gaussian Elimination. Given an $m\times n$ matrix $A$ over a field $\mathbb{F}$ with underlying bipartite graph $G$ , and given a tree decomposition $\mathcal{T}$ of $G$ of width $k$ and size $O(k(m+n))$ , it produces a matrix $U$ in row echelon form that is row equivalent to $A$ . The entries of the columns of $U$ that contain a pivot (and therefore generate a space of dimension $\operatorname*{rank}(A)$ ) are fully computed in time $O(k^{2}n)$ . The entries of the remaining columns of $U$ are described implicitly as linear combinations of at most $k$ other columns, and may be computed explicitly in time $O(km(n-\operatorname*{rank}(A))))$ . In particular, this algorithm allows us to compute the rank and the determinant of $A$ in time $O(k^{2}(m+n))$ . It also allows us to decide in time $O(k^{2}(m+n))$ whether the linear system $Ax=b$ has a solution, and to compute a solution of the linear system in case it exists. Our main result is stated more formally below.

Theorem 1.

Let $A$ be an $m\times n$ matrix over a field $\mathbb{F}$ , let $G$ be its underlying bipartite graph, and let $\mathcal{T}$ be a tree decomposition of $G$ of width $k$ and size $O(k(m+n))$ . Algorithm Fast Gaussian Elimination produces an $m\times n$ matrix $U$ in row echelon form, and a permutation matrix $Q$ of order $n$ with the property that $A Q$ is row equivalent to $U$ . The permutation matrix $Q$ , and the columns of $U$ that contain a pivot may be computed with $O(k^{2}(m+n))$ field operations. The full matrix $U$ may be computed with $O(km(n-\operatorname*{rank}(A)))$ additional field operations.

We actually also compute a permutation matrix $P$ of order $m$ such that $U$ is the row echelon form that would be obtained from $P\!AQ$ by the most standard algorithm (adding multiples of earlier rows to later rows).

Our algorithm is not based on the $O(k^{3}n)$ algorithm of Fomin et al. [5], as it uses standard tree decompositions in a natural way, and does not rely on the somewhat obscure notion of tree-partition width. Even though our algorithm removes the decades old severe restriction to pivoting in the main diagonal, the main idea of our Gaussian elimination algorithm is quite simple. The tree decomposition provides an elimination order for rows (equations) and columns (variables). This order sequentially hands out permissions to rows and columns to be eliminated. However, when a row $i$ receives a green light to be processed, it is only processed immediately if it contains a variable that has already received a green light. Otherwise, this row is put into a buffer to be delayed until one of its variables $x_{j}$ also has a green light. Then $a_{ij}$ is chosen as a pivot. Likewise, when a variable receives a green light then, if needed, its column is put into a buffer until some row containing this variable has a green light.

As was the case in Fomin et al. [5] we use a nice tree decomposition of the underlying bipartite graph $G$ as defined by Kloks [9]. However, to deal with $0$ ’s at intended pivot locations, we create temporary buffers for rows and columns that allow us to delay the selection of pivots as discussed above. As is typical for algorithms based on tree decompositions, row operations are actually performed on small submatrices of the input matrices. For simplicity, the algorithm does not manipulate the large given matrix. Instead, these small submatrices, called boxes, are maintained separately. The small submatrices in this paper include additional information to keep track of the rows and columns that lie in the temporary buffer, which allows us to make headway towards the output matrix in row echelon form even when accidental cancellations occur. It is crucial to perform so-called bookkeeping operations that ensure that the temporary buffers remain small during the entire application of the buffer.

The paper is organized as follows. The algorithm is described in Section 2, where we also discuss its correctness. The running time is analyzed in Section 3. We should mention that, to achieve this running time, we need to keep track of the field operations in a global way, rather than adding the contributions of the worst-case scenarios at each step. Together, these two sections establish Theorem 1. We provide several consequences of our result in Section 4.

2 The Algorithm

We now describe the algorithm Fast Gaussian Elimination. We start with the definition of a tree decomposition, which we now state. Let $G=(V,E)$ be a $\nu$ -vertex graph, with the standard assumption that $V=[\nu]=\{1,\ldots,\nu\}$ . A tree decomposition of a graph $G$ is a tree $\mathcal{T}$ with nodes $\{1,\ldots,r\}$ , where each node $t$ is associated with a bag $B_{t}\subseteq V$ , satisfying the following properties: (1) $\bigcup_{t=1}^{r}B_{t}=V$ ; (2) For every edge $\{v,w\}\in E$ , there exists $B_{t}$ containing $v$ and $w$ ; (3) For any $v\in V$ , the subgraph of $\mathcal{T}$ induced by the nodes whose bags contain $v$ is connected. The width of the tree decomposition $\mathcal{T}$ is defined as $\max_{t}\left(|B_{t}|-1\right)$ and the treewidth $\operatorname{tw}(G)$ of graph $G$ is the smallest $k$ such that $G$ has a tree decomposition of width $k$ . Tree decompositions were popularized by the seminal work of Robertson and Seymour [14, 15], but other definitions that are similar or even equivalent have appeared in earlier work, see [1, 8]. As mentioned in the introduction, we use the concept of nice tree decomposition introduced by Kloks [9], which is a rooted tree decomposition $\mathcal{T}$ of a graph $G$ such that all nodes are of one of the following types:

(a)

(Leaf) The node $t$ is a leaf of $\mathcal{T}$ ;
(b)

(Introduce) The node $t$ introduces vertex $v$ , that is, it has a single child $s$ , $v\notin B_{s}$ and $B_{t}=B_{s}\cup\{v\}$ .
(c)

(Forget) The node $t$ forgets vertex $v$ , that is, $t$ has a single child $s$ , $v\notin B_{t}$ and $B_{s}=B_{t}\cup\{v\}$ ;
(d)

(Join) The node $t$ is a join, that is, it has two children $s$ and $s^{\prime}$ , and $B_{t}=B_{s}=B_{s^{\prime}}$ .

We further assume that the bag associated with the root is empty. This is easy to accomplish, as any tree decomposition $\mathcal{T}$ that satisfies (a)-(d), but does not satisfy this additional requirement, may be turned into a nice tree decomposition with empty root bag by appending a path of length at most $k+1$ to the root of $\mathcal{T}$ , where all nodes have type forget. Having a root with an empty bag ensures that each vertex of $G=(V,E)$ is associated with exactly one forget node that forgets it. Moreover, it ensures that, for any $v\in V$ , the node of $\mathcal{T}$ that is closest to the root among all $t$ such that $v\in B_{t}$ is the child of the node that forgets $v$ . Despite the additional structure, a nice tree decomposition may be efficiently derived from an arbitrary tree decomposition. More precisely, Kloks [9, Lemma 13.1.2] has shown that, if $G$ is a graph of order $\nu$ and we are given an arbitrary tree decomposition of $G$ with width $k$ and $r$ nodes, it is possible to turn it into a nice tree decomposition of $G$ with at most $4\nu$ nodes and width at most $k$ in time $O(k(\nu+r))$ . Regarding the computation of a tree decomposition, Bodlaender’s theorem [2] provides an optimal tree decomposition in linear time when the treewidth $k$ is a constant. Bodlaender et al. [3] and Korhonen [10] compute faster constant factor approximations in linear time. Thus, for bounded $k$ , there is no need to assume a tree decomposition is given, since both the tree decomposition and the Gaussian elimination are then computed in time $O(n)$ . However, the constants hidden in the $O$ -notation grow quickly with $k$ .

We are now ready to describe the algorithm. Let $A$ be an $m\times n$ matrix with entries in a field $\mathbb{F}$ and let $G=(V,E)$ be the underlying bipartite graph with vertex set $V=\{v_{1},\ldots,v_{m}\}\cup\{w_{1},\ldots,w_{n}\}$ associated with it. We wish to find an $m\times m$ permutation matrix $P$ , an $n\times n$ permutation matrix $Q$ and an $m\times n$ matrix $U$ in row echelon form such that $P\!AQ$ is row equivalent to $U$ in a direct way. We shall operate on a nice tree decomposition $\mathcal{T}$ of $G$ with node set $[r]$ and width $k$ rather than on the matrix $A$ itself. The algorithm Fast Gaussian Elimination works bottom-up on the rooted tree $\mathcal{T}$ , that is, it only processes a node $t$ after its children have been processed. Each node $t$ except the root produces a data structure known as a box and transmits it to its parent. A box is a matrix $A_{t}$ with $O(k)$ rows and columns. It may be recorded as a triple of matrices $(M_{t},N_{t},J_{t})$ whose rows and columns are labeled by distinct elements of $\{v_{1},\ldots,v_{m}\}$ and $\{w_{1},\ldots,w_{n}\}$ , respectively, associating rows and columns of the boxes with rows and columns of the original matrix. We may view it as

A_{t}=\begin{array}[]{|c|cc|}\hline\cr&&\\ \mathbf{0}_{m_{t}\times n_{t}}&&M_{t}\\ &&\\ \hline\cr&&\\ N_{t}&&J_{t}\\ &&\\ \hline\cr\end{array},

(1)

where $J_{t}$ is an $r_{t}\times c_{t}$ matrix whose rows and columns are labeled by the elements in $B_{t}$ , the bag associated with node $t$ . Here $r_{t}$ is the number of row vertices (i.e., vertices $v_{i}$ ) in $B_{t}$ and $c_{t}$ is the number of column vertices (i.e., vertices $w_{j}$ ) in $B_{t}$ . The matrices $M_{t}$ and $N_{t}$ have dimensions $m_{t}\times c_{t}$ and $r_{t}\times n_{t}$ , respectively, where $m_{t}<2c_{t}$ and $n_{t}<2r_{t}$ . Observe that some of these matrices can be degenerated, in the sense that $m_{t}=0$ , $n_{t}=0$ , $c_{t}=0$ or $r_{t}=0$ . In a case where $m_{t}=0$ , $n_{t}=2$ , $c_{t}=2$ and $r_{t}=3$ , for instance, the box $A_{t}$ is $3\times 4$ matrix, $M_{t}$ is a matrix with no rows and two empty columns, and both $N_{t}$ and $J_{t}$ are $3\times 2$ matrices. We should note that a similar data structure, also called a box, has been exploited in the context of computing a diagonal matrix that is congruent to a given symmetric matrix using a tree decomposition of its underlying graph [6].

It will also be useful to assign a canonical ordering to the rows and columns of the input matrix. We assume that they are ranked according to the order in which their Forget nodes appear in some topological ordering (e.g., obtained by a post-order traversal of $\mathcal{T}$ ), that is, $v_{1}$ is the first vertex that is forgotten among all vertices that represent rows, $v_{2}$ is the second, and so on. The same applies to $w_{1},\ldots,w_{n}$ .

We start with an intuition about the meaning of the boxes and about how the algorithm works. Recall that the nodes of the nice tree decomposition $\mathcal{T}$ are ordered bottom-up as $1\ldots,r$ , where $r$ is the root and $B_{r}=\emptyset$ . At each node $t\in V(\mathcal{T})$ , the algorithm initializes box $A_{t}$ (if $t$ is a leaf) or produces a new box $A_{t}$ based on its bag $B_{t}$ and on the boxes transmitted by its children. While producing the box $A_{t}$ , the algorithm may access entries $a_{ij}$ of the input matrix such that both $i$ and $j$ are associated with vertices in $B_{t}$ . The box $A_{t}$ is then transmitted to its parent, except if $t$ is the root of $\mathcal{T}$ . We call this processing node $t$ .

It is also useful to keep in mind that the algorithm performs two types of elementary row operations, which we call authentic operations and bookkeeping operations. The former are operations that can be performed in the full matrix $A$ in order to reach row echelon form. The latter are operations that are only performed on the boxes as a way to limit their sizes, but which do not have full matrix counterparts. The effect of processing nodes may be viewed as a sequence of $m\times n$ matrices

\tilde{A}_{0}=A,\tilde{A}_{1},\ldots,\tilde{A}_{r}=U,

(2)

where $U$ is in row echelon form and $\tilde{A}_{t}$ is obtained from $\tilde{A}_{t-1}$ by performing the authentic elementary row operations that have been performed while node $t$ is processed. We should mention that this sequence of row equivalent matrices is a simplification of what actually happens, but that it could be achieved if an oracle gave us the proper ordering of rows and columns of $A$ . In reality, the proper orderings are also computed by the algorithm while the nodes are processed, leading to the permutation matrices $P$ and $Q$ mentioned in the statement of Theorem 1.

To achieve the complexity stated in Theorem 1, row operations cannot be actually performed on $m\times n$ matrices, but are instead performed only in the (smaller) boxes. It is useful to think that any row or column of the input matrix $A$ , say row $i$ , is initially untouched and changes to a regular row when the vertex $v_{i}$ associated with it appears in the bag of a node of the branch¹¹1We use “branch” to refer to the subtree induced by all descendants of a node. of the tree decomposition rooted at $t$ . When $v_{i}$ is removed from the bag (at the node that forgets it in the nice tree decomposition), either row $i$ is assigned a pivot in some column $j$ , in which case row $i$ is classified as processed (column $j$ is also classified as processed), or row $i$ becomes a buffer row, meaning that it has been given the green light for a pivot, but that the decision of assigning a pivot has been deferred to a later step because no column has received a green light. In the process of keeping the size of the box bounded, the algorithm may decide that it is impossible or unnecessary to assign a pivot to a buffer row or to a buffer column, in which case the status of the row or column also changes to processed. We observe that the classification of each row and column is defined independently for disjoint branches of the tree decomposition, and an important feature of the algorithm is that these independent classifications can be made compatible when two branches are merged (which happens at join nodes).

This finally gets us to the meaning of a box $A_{t}=(M_{t},N_{t},J_{t})$ at node $t$ . The rows and columns of the matrix $J_{t}$ are precisely the regular rows and columns of the branch $\mathcal{T}_{t}$ of $\mathcal{T}$ rooted at node $t$ . In other words, these are the rows and columns corresponding to the vertices in the bag $B_{t}$ . Moreover, each entry $i j$ of $J_{t}$ records the net change to the original entry $a_{ij}$ due to row operations performed while processing nodes in $\mathcal{T}_{t}$ . The matrix $M_{t}$ records the entries $i j$ of the matrix $\tilde{A}_{t}$ defined in (2) in the case where $i$ is a buffer row (with respect to $\mathcal{T}_{t}$ ) and $j$ is a regular column. Analogously, $N_{t}$ records the entries $i j$ of $\tilde{A}_{t}$ in the case where $i$ is a regular row and $j$ is a buffer column. This reveals another important feature of the algorithm: The entries of the full $m\times n$ matrix associated with regular rows and regular columns can easily be modified in parallel by computations in disjoint branches, as the algorithm does not capture their exact values while processing a single branch. The boxes just store the net changes to the values due to operations in this branch. However, this does not happen for buffer rows and buffer columns, for which exact values are recorded in the box. The top left corner of $A_{t}$ in (1) shows that, if $i$ is a buffer row and $j$ is a buffer column, the entry $i j$ in $\tilde{A}_{t}$ is equal to zero. This is consistent with the idea that the algorithm selects a new pivot whenever there are a buffer row $i$ and a buffer column $j$ for which the element $i j$ is nonzero, as both the row and the column had already been given the green light as possible rows and columns for a pivot. So, if row $i$ and column $j$ are in the buffer at the end of step $t$ , then the entry $i j$ must be 0.

While producing the boxes, the algorithm may select pivots for the output matrix $U$ , it may identify zero rows that cannot contain a pivot and it may identify columns that are linear combinations²²2Actually, they are not necessarily linear combinations of the other columns with respect to the entire matrix, but they are linear combinations with respect to a submatrix as depicted in Figure 1. of other columns, and therefore cannot contain pivots together with the other columns. The status of these rows and columns changes to processed, which means that information about them is recorded by global variables, while they are removed from the box.

After this description, we come back to a matrix $\tilde{A}_{t}$ in (2), depicted in Figure 1³³3The authors thank Elizandro Max Borba for the figure.. The top left entries lie in processed rows and processed columns that contain pivots. These entries are not modified later in the algorithm. The bottom rows are (processed) zero rows that have been produced while limiting the size of the set of buffer rows (it is indeed impossible to find a pivot in such a row), while the rightmost columns are (processed) columns that have been identified as linear combinations of other columns (except for possible entries in rows whose pivot had already been computed when this column was processed). As a consequence, it is unnecessary to assign a pivot to them. Moreover, while the actual values in the rightmost columns may be modified in later steps of the algorithm, we need not keep track of them, as each such column may be computed as a linear combination of (a known set of) $O(k)$ columns with smaller index at the end of the algorithm. The central submatrix represents entries in the buffer, and in regular or untouched rows and columns. Some of them lie in the box where computations are actually performed in a given step.

Figure 1: A description of an intermediate matrix in the Gaussian elimination process.

Before giving a formal description of every step of the algorithm, we briefly describe what is done at a node of each type. A node of type leaf merely initializes a box with regular rows or columns, while a node of type introduce simply adds a new regular row or column to the box transmitted by its child. Nodes of type join combine the information that is passed on by two different branches of the decomposition, which have been processed independently of each other. Let $A_{j}$ and $A_{\ell}$ denote the boxes produces by the children of $t$ . A new matrix $M_{t}$ is produced by stacking the matrix $M_{j}$ on top of $M_{\ell}$ , and a new matrix $N_{t}$ is produced by juxtaposing $N_{j}$ to the left of $N_{\ell}$ . Such matrices $M_{t}$ and $N_{t}$ may become too large, so that row operations are performed to remove some rows and/or columns from the buffer. Producing $J_{t}$ also requires merging $J_{j}$ and $J_{\ell}$ , as the regular rows and columns are the same. Being able to easily merge the matrices transmitted by the children is the reason why the rows and columns are considered with the canonical ordering mentioned above.

Nodes of type forget concentrate most of the action. Recall that each row $i$ of $A$ is associated with a vertex $v_{i}$ of $G$ , which in turn is associated with the unique Forget node of $\mathcal{T}$ that forgets $v_{i}$ . Analogously, each column $j$ of $A$ is associated with a vertex $w_{j}$ of $G$ and the unique Forget node of $\mathcal{T}$ that forgets $w_{j}$ . Assume that the algorithm is processing a Forget node $t$ , which means that it received a box $A_{s}$ of its child $s$ . Further assume that node $t$ forgets a vertex $v_{i}$ corresponding to row $i$ . This means that the algorithm singles out this row as a candidate for having a pivot. In the box $A_{s}$ , it checks whether there is a column $j$ that has already been singled out for a possible pivot, but which has no pivot so far. These are precisely the columns of $N_{s}$ . If such a column exists, it checks whether the entry $i j$ in $N_{s}$ nonzero. If this is also satisfied, the algorithm performs operations to create a pivot in row $i$ and column $j$ , and they are both removed from the box. If one of the conditions fails, row $i$ is removed from $J_{s}$ and added to $M_{t}$ , which contains rows singled out for a pivot, but which have no pivot so far. Furthermore, if the condition on the maximum number of rows of $M_{t}$ fails after the addition of this new row, the algorithm acts to remove some rows from the buffer.

Analogously, if node $t$ forgets vertex $w_{j}$ corresponding to column $j$ , it first checks whether there are rows that have been singled out for pivots, but have not been used, namely the rows of $M_{s}$ . It then looks for one such row $i$ for which the entry $i j$ is nonzero. If such a row exists, the algorithm performs operations to create a pivot in row $i$ and column $j$ , and they are both removed from the box. If one of the conditions fails, column $j$ is removed from $J_{s}$ and added to $N_{t}$ , which contains columns singled out for a pivot, but which have no pivot so far. If the condition on the number of columns of $N_{t}$ fails after the addition of this new column, the algorithm acts to remove some columns from the buffer.

A high level description of the algorithm may be found in Figure 2. Note that, in this descriptiom, we have five global variables: the number $\rho$ of pivots already computed, two vectors $\mathbf{u}$ and $\mathbf{p}$ of length $m$ and two vectors $\mathbf{w}$ and $\mathbf{q}$ of length $n$ . Initially, these are zero vectors. At the end of the algorithm, vector $\mathbf{u}$ allows us reconstruct the rows of the output matrix $U$ , while vectors $\mathbf{p}$ and $\mathbf{q}$ encode the permutations that generate $P$ and $Q$ , respectively. As we discussed, the algorithm may also find columns of $A$ that are linear combinations of other columns and remove them from boxes. When this happens, the linear combination that produces it is recorded in the vector $\mathbf{w}$ , so that the values of entries of $U$ in these columns can also be recovered at the end of the algorithm.

Figure 2: High level description of the algorithm Fast Gaussian Elimination. The entries of matrix

U

and the permutations leading to

P

and

Q

are constructed stepwise during the ForgetBox and JoinBox operations.

In the remainder of this section, we shall describe each operation in detail. To simplify our discussion, for a node $t$ , we say that buffer rows and buffer columns have type 1, while regular rows and regular columns have type 2.

Leaf Box

When a node is a leaf corresponding to a bag $B_{t}$ , then we apply procedure Leaf Box. This procedure simply initializes a box $A_{t}$ where $m_{t}=0$ , $n_{t}=0$ , and $J_{t}$ is a zero matrix of dimension $r_{t}\times c_{t}$ whose rows and columns are labeled by the elements of $B_{t}$ , which contains $r_{t}$ row vertices and $c_{t}$ column vertices. No elementary row operations are performed.

Introduce Box

When a node $t$ introduces a vertex $u$ , its bag satisfies $B_{t}=B_{s}\cup\{u\}$ , where $B_{s}$ (which does not contain $u$ ) is the bag of its child $s$ . The input of Introduce Box is the vertex $u$ that has been introduced and the box $A_{s}$ transmitted by its child. The box $A_{t}$ is obtained by the insertion of a new type 2 row or column labeled $u$ whose elements are all zero⁴⁴4The added row or column may actually be empty if a new row vertex is introduced, but $c_{s}=0$ and $n_{s}=0$ , or a new column vertex is introduced, but $r_{s}=0$ and $m_{s}=0$ ., taking care to insert it in the correct order (with respect to the canonical order of rows and columns of $J_{t}$ . More precisely, if $u$ is a row vertex $v_{i}$ , this means that $N_{t}$ and $J_{t}$ are obtained from $N_{s}$ and $J_{s}$ , respectively, by the addition of a zero row, while $M_{t}=M_{s}$ . It is clear that $n_{t}=n_{s}<2r_{s}<2(r_{s}+1)=2r_{t}$ . Similarly, if $v$ is a column vertex $w_{j}$ , the addition of this column means that $M_{t}$ and $J_{t}$ are obtained from $M_{s}$ and $J_{s}$ , respectively, by the addition of a zero column, while $N_{t}=N_{s}$ . It is clear that $m_{t}=m_{s}<2c_{s}<2(c_{s}+1)=2c_{t}$ . No elementary row operations are performed.

Join Box

Let $t$ be a node of type Join and let $A_{s}$ and $A_{s^{\prime}}$ be the boxes transmitted by its children, where $s<s^{\prime}<t$ . By the definition of box, $A_{s}$ and $A_{s^{\prime}}$ have the same rows and columns of type 2, that is, $J_{s}$ and $J_{s^{\prime}}$ have the same dimension and have rows and columns labeled by the same elements (in the same order). Moreover, because vertices of type 1 have been forgotten in their respective branches, the labels of the rows of $M_{s}$ and $M_{s^{\prime}}$ are all different, as are the labels of the columns of $N_{s}$ and $N_{s^{\prime}}$ . The JoinBox operation first creates a matrix $A_{t}^{\ast}$ whose rows and columns are labeled by the union of the labels of $A_{s}$ and $A_{s^{\prime}}$ with the structure below.

A_{t}^{\ast}=\begin{array}[]{|c|c|c|}\hline\cr\mathbf{0}_{m_{s}\times n_{s}}&% \mathbf{0}_{m_{s}\times n_{s^{\prime}}}&M_{s}\\ \hline\cr\mathbf{0}_{m_{s^{\prime}}\times n_{s}}&\mathbf{0}_{m_{s^{\prime}}% \times n_{s^{\prime}}}&M_{s^{\prime}}\\ \hline\cr N_{s}&N_{s^{\prime}}&J_{t}^{\ast}\\ \hline\cr\end{array}.

(3)

Here $J^{\ast}_{t}=J_{s}+J_{s^{\prime}}$ has order $r_{t}\times c_{t}$ and records the net change of the entries of the original matrix because of row operations in the two branches that are being merged at this step. Let $M_{t}^{\ast}$ be the $(m_{s}+m_{s^{\prime}})\times c_{t}$ matrix obtained by stacking $M_{s}$ on top of $M_{s^{\prime}}$ ⁵⁵5Alternatively, to preserve the canonical ordering of rows, we could merge the two matrices $M_{s}$ and $M_{s}^{\prime}$ in a way that preserves the canonical ordering. However, this does not affect the final result.. Let $N_{t}^{\ast}$ be the $r_{t}\times(n_{s}+n_{s^{\prime}})$ matrix obtained by placing $N_{s}$ to the left of $N_{s^{\prime}}$ .

If $m_{s}+m_{s^{\prime}}<2c_{t}$ , define $M_{t}=M_{t}^{\ast}$ . Otherwise $m_{s}+m_{s^{\prime}}\geq 2c_{t}$ . Apply Gaussian elimination to $M_{t}^{\ast}$ until we get a matrix $\tilde{M}_{t}$ in row echelon form. Here, we can use a standard algorithm for dense matrices that takes $O(c_{t}^{3})$ operations, as $M_{t}^{\ast}$ has $c_{t}$ columns and at most $4c_{t}-2$ rows. Since $m_{s}+m_{s^{\prime}}\geq 2c_{t}$ , this process ends with at least $c_{t}$ zero rows. The matrix $M_{t}$ is obtained from $\tilde{M}_{t}$ by removing all the zero rows, and by reordering the rows so that they remain in the original ordering of $M_{t}^{\ast}$ . Assume that the zero rows are labeled by the vertices $v_{i_{1}},\ldots,v_{i_{\ell}}$ . The algorithm updates the permutation vector $\mathbf{p}$ by assigning the values $i_{1},\ldots,i_{\ell}$ to the last $\ell$ components of $\mathbf{p}$ with value 0, which means that the $\ell$ zero rows of $M_{t}^{\ast}$ become the top zero rows in the description of Figure 1. Given that the new box is defined based on the matrix that has been modified, the type II row operations used for Gaussian elimination are authentic row operations. Given that the order of the rows has been preserved, type I row operations are bookkeeping operations (we observe that the labels of the rows used for authentic row operations will always refer to the original input matrix to account for the fact that row interchanging is not actually performed)⁶⁶6The algorithm could be easily adapted to a version where row interchanges are also authentic row operations, but this makes the analysis slightly more involved..

Similarly, if $n_{s}+n_{s^{\prime}}<2r_{t}$ , define $N_{t}=N_{t}^{\ast}$ . Otherwise, make an auxiliary copy of $N_{t}^{\ast}$ and apply Gaussian elimination on it until we get a matrix $\tilde{N}_{t}$ in row echelon form. As above, we can use a standard algorithm for dense matrices that takes $O(r_{t}^{3})$ operations. There is a set $S_{t}$ of at most $r_{t}$ columns that form a basis of the column space of $N_{t}^{\ast}$ (precisely the columns of $N_{t}^{\ast}$ with pivots in $\tilde{N}_{t}$ ). The matrix $N_{t}$ is obtained from $N_{t}^{\ast}$ by removing the columns that are not in $S_{t}$ , form which there are at least $n_{s}+n_{s^{\prime}}-r_{t}\geq r_{t}$ .

$\blacktriangleright$ Remark 2.

We emphasize that $N_{t}$ is defined in terms of $N_{t}^{\ast}$ rather than $\tilde{N}_{t}$ in this case, which means that the row operations performed to obtain $\tilde{N}_{t}$ are not performed in the full matrix while producing the sequence (2), but are instead auxiliary operations to find a basis of the column space of $N_{t}^{\ast}$ and to write the remaining columns as linear combinations of the columns in this basis. This is crucial. If we consider the effect of the row operations in the full matrix, while the nonzero entries of the rows corresponding to $M_{t}$ in the full matrix lie entirely within $M_{t}$ (except perhaps for entries in columns that are linear combinations of other columns), there may be nonzero entries in the rows corresponding to $N_{t}$ that lie outside the box $A_{t}$ . Indeed, the rows of $N_{t}$ are regular rows, so that they may even be processed in parallel by different branches of the tree decomposition. Adding multiples of them to other rows might affect entries that are not being monitored at this step. This is why the operations performed to turn a copy of $N_{t}^{\ast}$ to row echelon form must be bookkeeping operations.

Coming back to the description of the algorithm, assume that $w_{j_{1}},\ldots,w_{j_{\ell}}$ are the labels of the columns of $N_{t}^{\ast}$ with no pivot in $\tilde{N}_{t}$ , so that they are linear combinations of the remaining columns of $\tilde{N}_{t}$ . The algorithm updates the permutation vector $\mathbf{q}$ by assigning the values $j_{1},\ldots,j_{\ell}$ to the last $\ell$ components of $\mathbf{q}$ with value 0, which may be seen as moving these columns to the right (see Figure 1). Moreover, the corresponding entries of the vector $\mathbf{w}$ record information about the linear combinations. For each column $w_{j_{i}}$ , this information consists of the indices of the columns that produce the linear combination, the coefficients of this linear combination, and the number of pivots $\rho$ at this step of the algorithm (as the linear combinations are with respect to the part of the column that does not include rows for which pivots have been assigned.)

Forget Box

Assume that $t$ forgets vertex $v$ and let $s$ be its child, so that $B_{t}=B_{s}\setminus\{v\}$ . This procedure uses $A_{s}$ to produce a new box $A_{t}$ where the row or column associated with $v$ gets a pivot (and is therefore removed from the box) or changes to type 1.

Assume first that $v=v_{i}$ for a row $i$ . Consider the matrix $A_{t}^{\ast}$ given by⁷⁷7Representing the row associated with $v_{i}$ as the last row is helpful for visualizing the operations, but this need not be part of an implementation.

A^{\ast}_{t}=\begin{array}[]{|c|cc|}\hline\cr&&\\ \mathbf{0}_{m_{s}\times n_{s}}&&M_{t}^{\ast}\\ &&\\ \hline\cr&&\\ N_{t}^{\ast}&&J_{t}^{\ast}\\ &&\\ \hline\cr\mathbf{x}_{i}&&\mathbf{y}_{i}\\ \hline\cr\end{array},

(4)

where $M_{t}^{\ast}=M_{s}$ , $\mathbf{x}_{i}$ is row $v_{i}$ of $N_{s}$ , $N_{t}^{\ast}$ is obtained from $N_{s}$ by removing row $\mathbf{x}_{i}$ , $J_{t}^{\ast}$ is obtained from $J_{s}$ by removing row $v_{i}$ , and $\mathbf{y}_{i}$ is obtained from row $v_{i}$ of $J_{s}$ by adding the entry $a_{ij}$ of the input matrix to the entry corresponding to each column $w_{j}$ .

If $\mathbf{x}_{i}$ is empty or zero, the algorithm sets $J_{t}=J_{t}^{\ast}$ , and adds $\mathbf{y}_{i}$ to $M_{t}^{\ast}$ as the new bottom row to produce $M_{t}^{\ast\ast}$ . If $M_{t}^{\ast\ast}$ has at most $2c_{t}-1$ rows, the algorithm calls this matrix $M_{t}$ and concludes the step. Intuitively, this means that row $v_{i}$ is a new buffer row. Otherwise $M_{t}^{\ast\ast}$ has $2c_{t}$ rows. The algorithm turns this matrix into a matrix $\tilde{M}_{t}$ in row echelon form and removes $\ell\geq c_{t}$ zero rows of $\tilde{M}_{t}$ to produce $M_{t}$ , as was done in Join Box. As a byproduct of these authentic row operations, the algorithm updates $\mathbf{p}$ . Similarly, if $n_{s}<2r_{t}$ , we set $N_{t}=N_{t}^{\ast}$ . However, because $r_{t}=r_{s}-1$ and the number of columns of $N_{t}^{\ast}$ is $n_{s}$ , it may be that $n_{s}\geq 2r_{t}$ . If this happens, as was the case for Join Box, the algorithm performs bookkeeping operations to define $N_{t}$ , and update $\mathbf{q}$ and $\mathbf{w}$ .

If $\mathbf{x}_{i}$ is nonempty and nonzero, the algorithm selects the first nonzero entry of $\mathbf{x}_{i}$ , suppose that it is in column $w_{j}$ . It then performs row operations in $A^{\ast}_{t}$ in order to eliminate all the other nonzero elements in column $w_{j}$ (note that these elements are necessarily in rows of type 2, so that only rows in $N_{t}^{\ast}$ and $J_{t}^{\ast}$ are modified). The algorithm defines $\mathbf{p}(\rho)=i$ and $\mathbf{q}(\rho)=j$ , while $\mathbf{u}(\rho)$ records information about the nonzero entries of the vector $[\mathbf{x}_{i}\nobreak\ \mathbf{y}_{i}]$ , namely the labels $j$ of the nonzero columns of the vector $[\mathbf{x}_{i}\nobreak\ \mathbf{y}_{i}]$ and the actual values of the nonzero entries. The algorithm then adds one to the value of $\rho$ . Matrix $J_{t}$ is assigned this modified $J_{t}^{\ast}$ , matrix $M_{t}$ is defined as $M_{t}^{\ast}$ . Moreover, column $j$ is removed from $N_{t}^{\ast}$ to produce $N_{t}^{\ast\ast}$ . The row operations performed up to this point are authentic. If the number of columns in $N_{t}^{\ast\ast}$ is less than $2r_{t}$ (note that $r_{t}=r_{s}-1$ and that $N_{t}^{\ast\ast}$ has $n_{s}-1$ columns), $N_{t}$ is set to be $N_{t}^{\ast\ast}$ . Otherwise, $n_{t}=2r_{t}$ and the algorithm performs bookkeeping operations as in Join Box to define $N_{t}$ , and update $\mathbf{q}$ and $\mathbf{w}$ .

Next assume that $v=w_{j}$ for a column $j$ . Consider the matrix $A_{t}^{\ast}$ given by

A^{\ast}_{t}=\begin{array}[]{|c|cc|c|}\hline\cr&&&\\ \mathbf{0}_{m_{s}\times n_{s}}&&M_{t}^{\ast}&\mathbf{c}_{j}\\ &&&\\ \hline\cr&&&\\ N_{t}^{\ast}&&J_{t}^{\ast}&\mathbf{d}_{j}\\ &&&\\ \hline\cr\end{array},

(5)

where $N_{t}^{\ast}=N_{s}$ , $M_{t}^{\ast}$ is obtained from $M_{s}$ by removing column $w_{j}$ , $\mathbf{c}_{j}$ is column $w_{j}$ of $M_{s}$ , $J_{t}^{\ast}$ is obtained from $J_{s}$ by removing column $w_{j}$ and $\mathbf{d}_{j}$ is obtained from column $w_{j}$ of $J_{s}$ by adding the entry $a_{ij}$ (of the input matrix) to the entry corresponding to each row $v_{i}$ .

If $\mathbf{c}_{j}$ is empty or zero, the algorithm sets $J_{t}=J_{t}^{\ast}$ and adds $\mathbf{d}_{j}$ as a new column of to $N_{t}^{\ast}$ to produce $N_{t}^{\ast\ast}$ . If $N_{t}^{\ast\ast}$ has fewer than $2r_{t}$ columns, the algorithm calls this matrix $N_{t}$ and concludes the step. Otherwise $N_{t}^{\ast\ast}$ has $2r_{t}$ columns and the algorithm defines $N_{t}$ with bookkeeping operations as was done in Join Box (this leads to changes in $\mathbf{q}$ and $\mathbf{w}$ ). Similarly, if $m_{s}<2c_{t}$ , we set $M_{t}=M_{t}^{\ast}$ . However, because $c_{t}=c_{s}-1$ and the number of rows of $M_{t}^{\ast}$ is $m_{s}$ , it may be that $m_{s}\geq 2c_{t}$ . If this happens, the algorithm turns $M_{t}^{\ast}$ into a matrix $\tilde{M}_{t}$ in row echelon form and removes the $\ell\geq c_{t}$ zero rows of $\tilde{M}_{t}$ to produce $M_{t}$ , as was done above (this leads to changes in $\mathbf{p}$ ). These operations are authentic.

If $\mathbf{c}_{j}$ is nonempty and nonzero, the algorithm selects the first nonzero entry of $\mathbf{c}_{j}$ , suppose that it is in row $i$ (of type 1). Let $c_{ij}$ be this nonzero entry and let $\mathbf{x}_{i}$ be row $i$ in $M_{t}^{\ast}$ . The algorithm then performs authentic row operations in $A^{\ast}_{t}$ using row $i$ to eliminate all the other nonzero elements in column $j$ (note that only rows in $M_{t}^{\ast}$ and $J_{t}^{\ast}$ are modified). It then removes row $i$ from $M_{t}^{\ast}$ to produce $M_{t}^{\ast\ast}$ . As in the case of rows, the algorithm defines $\mathbf{p}(\rho)=i$ and $\mathbf{q}(\rho)=j$ , while $\mathbf{u}(\rho)$ records the nonzero entries of $[\mathbf{x}_{i},\mathbf{y}_{i}]$ . It then updates $\rho$ to $\rho+1$ . If the number of columns rows of $M_{t}^{\ast\ast}$ is less than $2c_{t}=2c_{s}-2$ , $M_{t}$ is set to be $M_{t}^{\ast\ast}$ . Otherwise, the algorithm turns $M_{t}^{\ast\ast}$ into a matrix $\tilde{M}_{t}$ in row echelon form with authentic operations and removes $\ell\geq c_{t}$ zero rows of $\tilde{M}_{t}$ to produce $M_{t}$ , as was done in Join Box (this leads to changes in $\mathbf{p})$ ). Matrix $N_{t}$ is defined as $N_{t}^{\ast}$ .

3 Running time of the algorithm

In this section, we find an upper bound on the number of operation required to produce matrices $P$ , $Q$ and $U$ as in the statement of Theorem 1 using algorithm Fast Gaussian Elimination. If the algorithm is given a tree decomposition $\mathcal{T}$ with $r$ nodes and width $k$ of the underlying graph $G$ of matrix $M$ , it first computes a nice tree decomposition $\mathcal{T}^{\prime}$ of the same width $k$ with fewer than $4(m+n)$ nodes in time $O(k(r+m+n))$ as discussed above. As it turns out, the number of nodes of each type in $\mathcal{T}^{\prime}$ is at most $m+n$ .

We consider the number of operations performed at each type of node. Leaf Box initializes a matrix in $r_{t}\times c_{t}=O(k^{2})$ trivial steps. Introduce Box uses $O(k)$ steps to create a row or column filled with zeros.

For Join Box and Forget Box, the main cost comes from row operations. As each row has size at most $n_{t}+c_{t}\leq 2r_{t}+c_{t}\leq 2(r_{t}+c_{t})\leq 2k+2$ , each such row operation requires $O(k)$ additions and multiplications. Regarding Forget Box, when vertex $v_{i}$ associated with a row $i$ or vertex $w_{j}$ associated with a column $j$ is forgotten, either a new type 1 row or column is created, or a new pivot is found. If the latter happens, the algorithm performs at most $m_{t}+r_{t}-1=O(k)$ row operations, where each row has size at most $O(k)$ , leading to $O(k^{2})$ field operations. Additionally, the algorithm may perform row operations to keep $M_{t}$ or $N_{t}$ of the right size, just as in Join Box (see below). If a new type 1 row or column is created, the algorithm just inserts the new row in $M_{t}$ or the new column in $N_{t}$ , possibly performing row operations to keep $M_{t}$ or $N_{t}$ of the right size, just as in Join Box.

It remains to account for the row operations performed to keep matrices $M_{t}$ and $N_{t}$ of the right size. Recall that, every time row operations of this type are necessary to produce $M_{t}$ , we start with a matrix of dimension $m^{\ast}\times c_{t}$ , where $m^{\ast}\geq 2c_{t}$ and we create at least $m^{\ast}-c_{t}\geq c_{t}$ zero rows. These rows become processed. For $N_{t}$ , we start with a matrix of dimension $r_{t}\times n^{\ast}$ , where $n^{\ast}\geq 2r_{t}$ and we identify at least $n^{\ast}-r_{t}\geq r_{t}$ columns that are linear combinations of other columns. Again, the columns become processed. We shall keep track of the overall number of field operations performed at these steps by “charging” the rows and columns processed at this step, that is, if $f$ field operations are performed and $\ell$ rows are processed, each of the $\ell$ rows is charged with $f/\ell$ operations. Because of the dimensions of the matrices, getting them to row echelon form takes $O({m^{\ast}}^{2}c_{t})$ and $O(n^{\ast}r_{t}^{2})$ field operations, respectively. In particular, each row can be charged at most $O(m^{\ast}c_{t})=O(k^{2})$ and each column can be charged at most $O(r_{t}^{2})=O(k^{2})$ . Since any row or column may be processed at most once in this way, the overall number of field operations that the algorithm performs to keep matrices $M_{t}$ or $N_{t}$ of the right size is $O((m+n)k^{2}))$ .

Overall, the number of operations taken by the algorithm to produce the vectors $\mathbf{p}$ and $\mathbf{q}$ , in addition to vectors $\mathbf{u}(i)$ and $\mathbf{w}(j)$ for every $i\leq\rho=\operatorname*{rank}(A)$ and every $j>\rho$ is $O(k(|\mathcal{T}|+m+n)+k^{2}(m+n))$ .

We now consider the decomposition $U_{t}=P_{t}\tilde{A}_{t}Q_{t}$ . The matrices $P$ and $Q$ are defined using $\mathbf{p}$ and $\mathbf{q}$ with no additional field operations. Indeed, $P=(p_{ij})$ is the permutation matrix such that $p_{\mathbf{p}(i)i}=1$ for $1\leq i\leq m$ , while $Q=(q_{ij})$ is the permutation matrix such that $q_{\mathbf{q}(i)i}=1$ for $1\leq i\leq n$ .

For $U$ , the entries may be computed in two ways. They are either obtained from a vector $\mathbf{u}(i)$ with no additional computation, or they are obtained as a linear combination of $O(k)$ earlier values using coefficients recorded in $\mathbf{w}(j)$ , which requires $O(k)$ sums and products. The number of operations performed to obtain entries of the second type is at most

\sum_{\ell=\operatorname*{rank}(A)+1}^{n}(m-\gamma_{\ell})O(k)=O((n-% \operatorname*{rank}(A))km).

(6)

This concludes the analysis.

4 Consequences of Theorem 1

We conclude the paper with the statement of results that are immediate consequences of Theorem 1.

Corollary 3.

If $A$ is an $m\times n$ matrix whose bipartite graph is given with a tree decomposition of width $k$ with $O(m+n)$ nodes and $b$ is a column vector of length $n$ , then using $O(k^{2}(m+n))$ time and arithmetic operations, it can be decided whether the system of linear equations $Ax=b$ has a solution, and if so, then such a solution can be produced.

Proof.

To get one solution of the system of linear equations $Ax=b$ , the algorithm to transform $A$ into row echelon form is augmented by doing all row operations on the right hand side as well. If one ever gets a zero row on the left hand side together with a non-zero value on the right hand side, then the system has no solutions. Otherwise, with the left hand side in row echelon form, a solution is obtained by back-substitution. Hereby, all variables that correspond to columns that have been identified as linearly dependent during the algorithm, or just have no pivot, are set to 0. $\hfill\blacktriangleleft$

Corollary 4.

The following hold if $A$ is an $m\times n$ matrix whose bipartite graph is given with a tree decomposition of width $k$ with $O(n)$ nodes:

(a)

The rank of $A$ can be computed using $O(k^{2}(m+n))$ time and arithmetic operations.
(b)

For $m=n$ , the determinant of $A$ can be computed using $O(k^{2}n)$ time and arithmetic operations.

Proof.

The rank is just the number of non-zero rows of the output matrix $U$ . If one of the columns is a linear combination of previous columns, the determinant is zero. Otherwise, the determinant is just the product of the signs of the permutation matrices $P$ and $Q$ with the product of the diagonal elements of the equivalent matrix in row echelon form. $\hfill\blacktriangleleft$

Corollary 5.

For a graph given with a tree decomposition of width $k$ with $O(n)$ nodes, the size of a maximum matching can be computed using $O(k^{2}n)$ time and arithmetic operations by a randomized algorithm with with one-sided error that is correct with probability at least $1-1/n^{c}$ for an arbitrary constant. In case of an error, the algorithm reports a suboptimal value.

Proof.

This follows from Theorem 1.4 of Fomin et al. [5] by replacing their $O(k^{3}n)$ rank algorithm by our $O(k^{2}n)$ rank algorithm. $\hfill\blacktriangleleft$

References

[1] U. Bertelè and F. Brioschi. Nonserial Dynamic Programming. Elsevier, 1972.
[2] Hans L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput., 25(6):1305–1317, 1996. doi:10.1137/S0097539793251219.
[3] Hans L. Bodlaender, Pål Grønås Drange, Markus S. Dregi, Fedor V. Fomin, Daniel Lokshtanov, and Michał Pilipczuk. A $c^{k}n$ 5-approximation algorithm for treewidth. SIAM Journal on Computing, 45(2):317–378, 2016. doi:10.1137/130947374.
[4] Sally Dong, Yin Tat Lee, and Guanghao Ye. A nearly-linear time algorithm for linear programs with small treewidth: a multiscale representation of robust central path. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1784–1797. ACM, 2021. doi:10.1145/3406325.3451056.
[5] Fedor V. Fomin, Daniel Lokshtanov, Saket Saurabh, Michal Pilipczuk, and Marcin Wrochna. Fully polynomial-time parameterized computations for graphs and matrices of low treewidth. ACM Trans. Algorithms, 14(3):34:1–34:45, 2018. doi:10.1145/3186898.
[6] Martin Fürer, Carlos Hoppen, and Vilmar Trevisan. Efficient diagonalization of symmetric matrices associated with graphs of small treewidth. Theoretical Computer Science, 1040:115187, 2025. doi:10.1016/j.tcs.2025.115187.
[7] Martin C. Golumbic and Clinton F. Goss. Perfect elimination and chordal bipartite graphs. Journal of Graph Theory, 2(2):155–163, 1978. doi:10.1002/jgt.3190020209.
[8] Rudolf Halin. S-functions for graphs. Journal of Geometry, 8(1):171–186, 1976. doi:10.1007/BF01917434.
[9] Ton Kloks. Treewidth, Computations and Approximations, volume 842 of Lecture Notes in Computer Science. Springer, 1994. doi:10.1007/BFB0045375.
[10] Tuukka Korhonen. A single-exponential time 2-approximation algorithm for treewidth. SIAM Journal on Computing, pages FOCS21–174, November 2023. doi:10.1137/22M147551X.
[11] Seymour V. Parter. The use of linear graphs in Gauss elimination. SIAM Review, 3(2):119–130, 1961. doi:10.1137/1003021.
[12] Richard Peng and Santosh S. Vempala. Solving sparse linear systems faster than matrix multiplication. Commun. ACM, 67(7):79–86, 2024. doi:10.1145/3615679.
[13] Venkatesh Radhakrishnan, Harry B. Hunt III, and Richard E. Stearns. Efficient algorithms for solving systems of linear equations and path problems. In Alain Finkel and Matthias Jantzen, editors, STACS 92, 9th Annual Symposium on Theoretical Aspects of Computer Science, Cachan, France, February 13-15, 1992, Proceedings, volume 577 of Lecture Notes in Computer Science, pages 109–119. Springer, 1992. doi:10.1007/3-540-55210-3_177.
[14] Neil Robertson and Paul D. Seymour. Graph minors. I. excluding a forest. J. Comb. Theory B, 35(1):39–61, 1983. doi:10.1016/0095-8956(83)90079-5.
[15] Neil Robertson and Paul D. Seymour. Graph minors. II. algorithmic aspects of tree-width. J. Algorithms, 7(3):309–322, 1986. doi:10.1016/0196-6774(86)90023-4.
[16] Donald J. Rose. Triangulated graphs and the elimination process. Journal of Mathematical Analysis and Applications, 32(3):597–609, 1970. doi:10.1016/0022-247X(70)90282-9.
[17] Donald J. Rose and Robert E. Tarjan. Algorithmic aspects of vertex elimination on directed graphs. SIAM Journal on Applied Mathematics, 34(1):176–197, 1978.
[18] Donald J. Rose, Robert E. Tarjan, and George S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM Journal on Computing, 5(2):266–283, 1976. doi:10.1137/0205021.
[19] David R. Wood. On tree-partition-width. European Journal of Combinatorics, 30(5):1245–1253, 2009. Part Special Issue on Metric Graph Theory. doi:10.1016/j.ejc.2008.11.010.

[bib.bib1] [1] U. Bertelè and F. Brioschi. Nonserial Dynamic Programming. Elsevier, 1972.

[bib.bib2] [2] Hans L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput., 25(6):1305–1317, 1996. doi:10.1137/S0097539793251219.

[bib.bib3] [3] Hans L. Bodlaender, Pål Grønås Drange, Markus S. Dregi, Fedor V. Fomin, Daniel Lokshtanov, and Michał Pilipczuk. A $c^{k}n$ 5-approximation algorithm for treewidth. SIAM Journal on Computing, 45(2):317–378, 2016. doi:10.1137/130947374.

[bib.bib4] [4] Sally Dong, Yin Tat Lee, and Guanghao Ye. A nearly-linear time algorithm for linear programs with small treewidth: a multiscale representation of robust central path. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1784–1797. ACM, 2021. doi:10.1145/3406325.3451056.

[bib.bib5] [5] Fedor V. Fomin, Daniel Lokshtanov, Saket Saurabh, Michal Pilipczuk, and Marcin Wrochna. Fully polynomial-time parameterized computations for graphs and matrices of low treewidth. ACM Trans. Algorithms, 14(3):34:1–34:45, 2018. doi:10.1145/3186898.

[bib.bib6] [6] Martin Fürer, Carlos Hoppen, and Vilmar Trevisan. Efficient diagonalization of symmetric matrices associated with graphs of small treewidth. Theoretical Computer Science, 1040:115187, 2025. doi:10.1016/j.tcs.2025.115187.

[bib.bib7] [7] Martin C. Golumbic and Clinton F. Goss. Perfect elimination and chordal bipartite graphs. Journal of Graph Theory, 2(2):155–163, 1978. doi:10.1002/jgt.3190020209.

[bib.bib8] [8] Rudolf Halin. S-functions for graphs. Journal of Geometry, 8(1):171–186, 1976. doi:10.1007/BF01917434.

[bib.bib9] [9] Ton Kloks. Treewidth, Computations and Approximations, volume 842 of Lecture Notes in Computer Science. Springer, 1994. doi:10.1007/BFB0045375.

[bib.bib10] [10] Tuukka Korhonen. A single-exponential time 2-approximation algorithm for treewidth. SIAM Journal on Computing, pages FOCS21–174, November 2023. doi:10.1137/22M147551X.

[bib.bib11] [11] Seymour V. Parter. The use of linear graphs in Gauss elimination. SIAM Review, 3(2):119–130, 1961. doi:10.1137/1003021.

[bib.bib12] [12] Richard Peng and Santosh S. Vempala. Solving sparse linear systems faster than matrix multiplication. Commun. ACM, 67(7):79–86, 2024. doi:10.1145/3615679.

[bib.bib13] [13] Venkatesh Radhakrishnan, Harry B. Hunt III, and Richard E. Stearns. Efficient algorithms for solving systems of linear equations and path problems. In Alain Finkel and Matthias Jantzen, editors, STACS 92, 9th Annual Symposium on Theoretical Aspects of Computer Science, Cachan, France, February 13-15, 1992, Proceedings, volume 577 of Lecture Notes in Computer Science, pages 109–119. Springer, 1992. doi:10.1007/3-540-55210-3_177.

[bib.bib14] [14] Neil Robertson and Paul D. Seymour. Graph minors. I. excluding a forest. J. Comb. Theory B, 35(1):39–61, 1983. doi:10.1016/0095-8956(83)90079-5.

[bib.bib15] [15] Neil Robertson and Paul D. Seymour. Graph minors. II. algorithmic aspects of tree-width. J. Algorithms, 7(3):309–322, 1986. doi:10.1016/0196-6774(86)90023-4.

[bib.bib16] [16] Donald J. Rose. Triangulated graphs and the elimination process. Journal of Mathematical Analysis and Applications, 32(3):597–609, 1970. doi:10.1016/0022-247X(70)90282-9.

[bib.bib17] [17] Donald J. Rose and Robert E. Tarjan. Algorithmic aspects of vertex elimination on directed graphs. SIAM Journal on Applied Mathematics, 34(1):176–197, 1978.

[bib.bib18] [18] Donald J. Rose, Robert E. Tarjan, and George S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM Journal on Computing, 5(2):266–283, 1976. doi:10.1137/0205021.

[bib.bib19] [19] David R. Wood. On tree-partition-width. European Journal of Combinatorics, 30(5):1245–1253, 2009. Part Special Issue on Metric Graph Theory. doi:10.1016/j.ejc.2008.11.010.

Fast Gaussian Elimination for Low Treewidth Matrices

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Theorem 1.

2 The Algorithm

Leaf Box

Introduce Box

Join Box

▶ Remark 2.

Forget Box

3 Running time of the algorithm

4 Consequences of Theorem 1

Corollary 3.

Proof.

Corollary 4.

Proof.

Corollary 5.

Proof.

References

$\blacktriangleright$ Remark 2.