Efficient Removal Lemmas for Matrices

It was recently proved in Alon et al. (2017) that any hereditary property of two-dimensional matrices (where the row and column order is not ignored) over a finite alphabet is testable with a constant number of queries, by establishing the following ordered matrix removal lemma: For any finite alphabet Γ, any hereditary property P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document} of matrices over Γ, and any 𝜖 > 0, there exists fP(𝜖)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$f_{\mathcal {P}}(\epsilon )$\end{document} such that for any matrix M over Γ that is 𝜖-far from satisfying P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document}, most of the fP(𝜖)×fP(𝜖)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$f_{\mathcal {P}}(\epsilon ) \times f_{\mathcal {P}}(\epsilon )$\end{document} submatrices of M do not satisfy P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document}. Here being 𝜖-far from P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document} means that one needs to modify at least an 𝜖-fraction of the entries of M to make it satisfy P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document}. However, in the above general removal lemma, fP(𝜖)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$f_{\mathcal {P}}(\epsilon )$\end{document} grows very quickly as a function of 𝜖− 1, even when P\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {P}$\end{document} is characterized by a single forbidden submatrix. In this work we establish much more efficient removal lemmas for several special cases of the above problem. In particular, we show the following, which can be seen as an efficient binary matrix analogue of the triangle removal lemma: For any fixed s × t binary matrix A and any 𝜖 > 0 there exists δ > 0 polynomial in 𝜖, such that for any binary matrix M in which less than a δ-fraction of the s × t submatrices are equal to A, there exists a set of less than an 𝜖-fraction of the entries of M that intersects every copy of A in M. We generalize the work of Alon et al. (2007) and make progress towards proving one of their conjectures. The proofs combine their efficient conditional regularity lemma for matrices with additional combinatorial and probabilistic ideas.


Introduction
Removal lemmas are structural combinatorial results that relate the density of "forbidden" substructures in a given large structure S with the distance of S from not containing any of the forbidden substructures, stating that if S contains a small number of forbidden substructures, then one can make S free of such substructures by making only a small number of modifications in it. Removal lemmas are closely related to many problems in Extremal Combinatorics, and have direct implications in Property Testing and other areas of Mathematics and Computer Science, such as Number Theory, Discrete Geometry, and Communication Complexity.
The first known removal lemma has been the celebrated (non-induced) graph removal lemma, established by Ruzsa and Szemerédi [26] (see also [3,4]). This fundamental result in Graph Theory states that for any fixed graph H on h vertices and any > 0 there exists δ > 0, such that for any graph G on n vertices that contains at least n 2 copies of H that are pairwise edge-disjoint, the total number of copies of H in G is at least δn h . Many extensions and strengthenings of the graph removal lemma have been obtained, as is described in more detail in Section 2.
In this work, we consider removal lemmas for two-dimensional matrices (with row and column order) over a finite alphabet. For simplicity, the results are generally stated for square matrices, but are easily generalizable to non-square matrices. Some of the results also hold for matrices in more than two dimensions. The notation below is given for twodimensional matrices, but carries over naturally to other combinatorial structures, such as graphs and multi-dimensional matrices.
An m × n matrix M over the alphabet is viewed here as a function M : [m] × [n] → , and the row and column order is dictated by the natural order on their indices. Any matrix that can be obtained from a matrix M by deleting some of its rows and columns (while preserving the row and column order) is considered a submatrix of M. We say that M is binary if the alphabet is = {0, 1} and ternary if = {0, 1, 2}. A matrix property P over is simply a collection of matrices M : [m] × [n] → . A matrix is -far from P if one needs to change at least an -fraction of its entries to get a matrix that satisfies P. A property P is hereditary if it is closed under taking submatrices, that is, if M ∈ P then any submatrix M of M satisfies M ∈ P. For any family F of matrices over , the property of F-freeness, denoted by P F , consists of all matrices over that do not contain a submatrix from F. Observe that P is hereditary if and only if it is characterized by some family F of forbidden submatrices, i.e. P = P F .
While the investigation of graph removal lemmas has been quite extensive, as described in Section 2 below, the first known removal lemma for ordered graph-like two-dimensional structures, and specifically for (row and column ordered) matrices, was only obtained very recently by the authors and Fischer [2]. Theorem 1.1 [2] Fix a finite alphabet . For any hereditary property P of matrices over and any > 0 there exists f P ( ) satisfying the following. If a matrix M is -far from P then at least a 2/3-fraction of the f P ( ) × f P ( ) submatrices of M do not satisfy P.
However, even when P is characterized by a single forbidden submatrix, the upper bound on f P ( ) guaranteed by the removal lemma in [2] is very large; in fact, it is at least as large as a wowzer (tower of towers) type function of . On the other hand, a lower bound of Fischer and Rozenberg [17] implies that one cannot hope for a polynomial dependence of f P ( ) in −1 in general (for the non-binary case), even when P is characterized by a single forbidden submatrix.
Thus, it is natural to ask for which hereditary matrix properties P there exist removal lemmas with more reasonable upper bounds on f P ( ), and specifically, to identify large families of properties P for which f P ( ) is polynomial in −1 . In this work we focus on this question, mainly for matrices over a binary alphabet.
A natural motivation for the investigation of removal lemmas comes from property testing. This active field of study in computer science, initiated by Rubinfeld and Sudan [25] (see [20] for the graph case), is dedicated to finding fast algorithms to distinguish between objects that satisfy a certain property and objects that are far from satisfying this property; these algorithms are called testers. An -tester for a matrix property P is a (probabilistic) algorithm that is given query access to the entries of the input matrix M, and is required to distinguish, with error probability at most 1/3, between the case that M satisfies P and the case that M is -far from P. If the tester always answers correctly when M satisfies P, we say that the tester has a one-sided error. We say that P is testable if there is a one-sided error tester for P that makes a constant number of queries (that depends only on P and but not on the size of the input). Furthermore, P is easily testable if the number of queries is polynomial in −1 . Clearly, any hereditary property of matrices is testable by Theorem 1.1, while any property P for which f P ( ) is shown to be polynomial in −1 is easily testable.

Background and Main Results
The results here are stated and proved for square n × n matrices, but can be generalized to non-square matrices in a straightforward manner. Our first main result is an efficient weak removal lemma for binary matrices.
where δ −1 is polynomial in −1 , so that the following holds. For any n × n binary matrix M that contains n 2 pairwise-disjoint copies of A, the total number of copies of A in M is at least δn s+t .
Here a set of pairwise-disjoint copies of A in M is a set of s × t submatrices of M, all equal to A, such that any entry of M is contained in at most one of the submatrices. Theorem 1.2 is an analogue for binary matrices of the non-induced graph removal lemma. However, in the graph removal lemma, δ −1 is not polynomial in −1 in general, in contrast to the situation in Theorem 1.2.
Alon, Fischer and Newman [5] proved an efficient induced removal lemma for finite families F of binary matrices that are robust against permutations. A family F of matrices, or equivalently, a hereditary matrix property P F , is closed under row (column) permutations if for any A ∈ F, any matrix created by permuting the rows (columns respectively) of A is in F. F is closed under permutations if it is closed under row permutations and under column permutations. Theorem 1.3 [5] Let F be a finite family of binary matrices that is closed under permutations. For any > 0 there exists δ > 0, where δ −1 is polynomial in −1 , such that any n × n binary matrix that is -far from F-freeness contains δn s+t copies of some s × t matrix A ∈ F.
The main consequence of Theorem 1.3 is an efficient induced removal lemma for bipartite graphs. Indeed, when representing a bipartite graph by its (bi-)adjacency matrix, a forbidden subgraph H is represented by the family F of all matrices that correspond to bipartite graphs isomorphic to H . Note that F is indeed closed under permutations in this case. Thus, any hereditary bipartite graph property characterized by a finite set of forbidden induced subgraphs is easily testable.
The problem of understanding whether the statement of Theorem 1.3 holds for any finite family F of binary matrices, was raised in [5] and is still open. Only recently in [2] (see also [12]) it was shown that the statement holds if we ignore the polynomial dependence, as stated in Theorem 1.1.

Problem 1.4
Is it true that for any fixed finite family F of binary matrices and any > 0, there exists δ = δ( , F) > 0, where δ −1 is polynomial in −1 , such that any n × n binary matrix M that is -far from F-freeness contains δn s+t copies of some s × t matrix A ∈ F? Theorem 1.2 implies that to settle Problem 1.4 it is enough to show the following. Fix a finite family F of binary matrices. Then for any > 0 there exists τ > 0, with τ −1 polynomial in −1 , such that any n × n binary matrix that is -far from F-freeness contains τ n 2 pairwise disjoint copies of matrices from F.
Our second main result makes progress towards solving Problem 1.4 by generalizing the statement of Theorem 1.3 to any family F of binary matrices that is closed under row (or column) permutations. From now on we only state the results for families that are closed under row permutations, but analogous results hold for families closed under column permutations. −1 , so that the following holds. For every n × n binary matrix M that is -far from F-freeness there exists an s × t matrix A ∈ F, so that M contains at least δn s+t copies of A.

Corollary 1.6 Any hereditary property of binary matrices that is characterized by a finite forbidden family closed under row permutations is easily testable.
Our proof of Theorem 1.5 is somewhat simpler than the original proof of Theorem 1.3. One of the main tools in the proofs of Theorems 1.2 and 1.5 is an efficient conditional regularity lemma for matrices developed in [5] (see also [22]). In the proof of Theorem 1.5 we only use a simpler form of the lemma, which is also easier to prove. The statement of the lemma and the proofs of Theorems 1.2, 1.5 appear in Section 4.
Besides the above two main results, we also describe a simpler variant of the construction of Fischer and Rozenberg [17], showing that for ternary matrices, the dependence between the parameters is not polynomial in general. We further suggest a way to tackle the weak removal lemma (i.e. the analogue of Theorem 1.2, without the polynomial dependence) in high dimensional matrices over arbitrary alphabets, by reducing it to an equivalent problem that looks more accessible. For more details, see Section 5.

Related Work
Removal lemmas have been studied extensively in the context of graphs. The non-induced graph removal lemma (which was stated in the beginning of Section 1) has been one of the first applications of the celebrated Szemerédi graph regularity lemma [28]. The induced graph removal lemma, established in [4] by proving a stronger version of the graph regularity lemma, is a similar result considering induced subgraphs. It states that for any finite family F of graphs and any > 0 there exists δ = δ(F , ) > 0 with the following property. If an n-vertex graph G is -far from F-freeness, then it contains at least δn v(F ) induced copies of some F ∈ F. Here v(F ) denotes the number of vertices in F , and G is said to be (induced) F-free if no induced subgraph of G is isomorphic to a graph from F.
The induced graph removal lemma was later extended to infinite families [9], stating the following. For any finite or infinite family F of graphs and any > 0 there exists f F ( ) with the following property. If an n-vertex graph G is -far from F-freeness, then with probability at least 2/3, a random induced subgraph of G on f F ( ) vertices contains a graph from F. Note that when F is finite, the statement of the infinite induced removal lemma is indeed equivalent to that of the finite version of the induced removal lemma.
The graph removal lemma was also extended to hypergraphs [21,23,24,29]. See [15] for many more useful variants, quantitative strengthenings and extensions of the graph removal lemma.
Very recently, the authors and Fischer [2] generalized the (finite and infinite) induced graph removal lemma by obtaining an order-preserving version of it, and also showed that the same type of proof can be used to obtain a removal lemma for two-dimensional matrices (with row and column order) over a finite alphabet; this is Theorem 1.1 above. In [11], the second author and Fischer proved characterization-type results in property testing of matrices (with row and column order) and ordered graphs. A special case of one of the results in [11] shows, building on [2], that the distance of a given matrix (or ordered graph) from any hereditary property of matrices (or ordered graphs) can be estimated, with good probability and up to a constant additive error, using a constant number of queries.
However, even for the non-induced graph removal lemma where the forbidden subgraph is a triangle, the best known general upper bound for δ −1 in terms of −1 is of tower-type [14,18]. On the other hand, the best known lower bound for the dependence is super-polynomial but sub-exponential, and builds on a construction of Behrend [10]. See [1] for more details. Understanding the "right" dependence of δ −1 in −1 , even for the simple case where the forbidden graph H is a triangle, is considered an important and difficult open problem.
In view of the above discussion, a lot of effort has been dedicated to the problem of characterizing the hereditary graph properties P for which f P ( ) is polynomial in −1 , i.e., the easily testable graph properties. See the recent work of Gishboliner and Shapira [19]; for other previous works on this subject, see, e.g., [1,6,8]. Our work also falls under this category, but for (ordered) matrices instead of graphs; it is the first work of this type for ordered two-dimensional graph-like structures.
We finish by mentioning several other relevant removal lemma type results. Removal lemmas for vectors (i.e. one dimensional matrices where the order is important) are generally easier to obtain; in particular, a removal lemma for vectors over a fixed finite alphabet can be derived from a removal lemma for regular languages proved in [7]. A removal lemma for partially ordered sets with a grid-like structure, which can be seen as a generalization of the removal lemma for vectors, can be deduced from a result of Fischer and Newman in [16], where they mention that this problem for submatrices is more complicated and not understood. Recently, Ben-Eliezer, Korman and Reichman [13] obtained a removal lemma for patterns in multi-dimensional matrices. A pattern must be taken from consecutive locations, whereas in our case the rows and columns of a submatrix need not be consecutive. The case of patterns behaves very differently than that of submatrices, and in particular, in the removal lemma for patterns the parameters are linearly related (for any alphabet size) unlike the case of submatrices (in which, for alphabets of 3 letters or more, the relation cannot be polynomial).

Notation
Here we give some more notation that will be useful throughout the rest of the paper. We give the notation for rows but the notation for columns is equivalent. Let M : [m]×[n] → be an m × n matrix. For two rows in M whose indices in I are r < r , we say that row r is smaller than row r and row r is larger than row r. The predecessor of row r in M is the largest rowr in M smaller than r. In this case we say that r is the successor ofr.
Let S be the submatrix of M on {r 1 , . . . , r s } × {c 1 , . . . , c t } where r 1 < . . . < r s and c 1 < . . . < c t . For i = 1, . . . , s, the i-row-index of S in M is r i ; For two submatrices S, S of the same dimensions and with i-row-indices r i , r i respectively we say that S is The elements of X, Y are called row-separators, column-separators respectively.

Folding and Unfoldable Matrices
A matrix is unfoldable if no two neighboring rows in it are equal and no two neighboring columns in it are equal. The row folding (column folding) of a matrix A is the unique matrix generated from A by deleting any row (column, respectively) of A that is equal to its predecessor. The folding of A, denotedÃ, is the column folding of the row folding of A. Note thatÃ is unfoldable, and thatÃ is also the row folding of the column folding of A.
For any 1 ≤ i ≤ s, where s is the number of rows in A, define r A (i) as the unique integer r satisfying the following: if one deletes all rows of A numbered j < i that are equal to their successors, then the row originally numbered i becomes row number r in the new matrix. Note that r A (1) = 1 and r A (s) is the number of rows ofÃ.

Lemma 3.1 Fix an s × t matrix A and letÃ be its s × t folding. For any > 0 there exist
where n 0 and δ −1 are polynomial in −1 , such that for any n ≥ n 0 , any n × n matrix M that contains n s +t copies ofÃ also contains δn s+t copies of A.
Lemma 3.1 implies that generally, to prove removal lemma type results for finite families, it is enough to only consider families of unfoldable matrices. The proof follows by applying the following lemma twice.

Lemma 3.2 Let
A be an s × t fixed matrix and let A be the s × t row folding of A. Then for any > 0 there exist n 1 = n 1 (A, ) > 0 and τ = τ (A, ) > 0, where n 1 and τ −1 are polynomial in −1 , such that for any n ≥ n 1 , any n×n matrix M that contains n s +t copies of A also contains τ n s+t copies of A. The analogous statement for the column folding of A holds as well.
Indeed, one can deduce 3.1 from Lemma 3.2 as follows. First, denote the s × t row folding of A by A and recall thatÃ is the column folding of A . By Lemma 3.2, if an n × n matrix M contains n s +t copies ofÃ, then M contains τ n s +t copies of A , where τ −1 is polynomial in −1 . Applying Lemma 3.2 once again, we conclude that M contains δn s+t copies of A, where δ −1 is polynomial in τ −1 and so in −1 .
Proof of Lemma 3.2 Suppose that the row folding A of A has dimensions s × t. Let T be the family of all n × t submatrices S of M containing at least n s /2 copies of A . The total number of s × t submatrices in any S ∈ T is n s ≤ n s , so the number of copies of A in submatrices from T is at most |T |n s . On the other hand, the number of n × t submatrices of M is n t ≤ n t , so the number of A copies in n × t submatrices not in T is less than n s +t /2. Hence the total number of A copies in submatrices from T is at least n s +t /2, implying that |T | ≥ n t /2.
We claim that any S ∈ T contains a collection A(S) of n/2s pairwise disjoint copies of A . To show this, we follow a greedy approach, starting with a collection B of all A copies in S and with empty A. As long as B is not empty, we arbitrarily choose a copy C ∈ B of A , add C to A and delete all A copies intersecting C (including itself) from B. In each step, the number of deleted copies is at most s n s −1 , so the number of steps is at least n s /2s n s −1 = n/2s .
Let δ = /3ss and take S ∈ T . Assuming that n ≥ δ −1 , pick disjoint collections Then there are δ s n s copies of A in S. Indeed, each s × t submatrix of S whose i-th row is taken as row number r A (i) of a matrix from A i is equal to A. Therefore, the total number of A copies in M is at least |T |δ s n s ≥ δ s n s+t /2, as desired.

Proofs for the Binary Case
This section is dedicated to the proof of our main results in the binary domain: Theorem 1.2 and Theorem 1.5. As a general remark for the proofs in this section, we may and will assume that a square matrix M is sufficiently large (given > 0), by which we mean that M is an n × n matrix with n ≥ n 0 for a suitable n 0 > 0 that is polynomial in −1 (and also depends on the "small" matrix A in Theorem 1.2 and on the forbidden family F in Theorem 1.5).
One of the main tools in the proofs of this section is a conditional regularity lemma for matrices due to Alon, Fischer and Newman [5]. We describe a simpler version of the lemma (this is Lemma 4.1 below) along with another useful result from their paper (Lemma 4.2 below). Combining these results together yields the original version of the conditional regularity lemma used in the original proof of Theorem 1.3 in [5]. It is worth noting that even though Theorem 1.5 generalizes Theorem 1.3, for its proof we only need the simpler Lemma 4.1 and not the original regularity lemma, whose proof requires significantly more work. Lemma 4.2 is only used in the proof of Theorem 1.2.
We start with some definitions. A (δ, r)-row-clustering of an n×n matrix M is a partition of the set of rows of M into r + 1 clusters R 0 , . . . , R r such that the error cluster R 0 satisfies |R 0 | ≤ δn and for any i = 1, . . . , r, every two rows in R i differ in at most δn entries. That is, for every e, e ∈ R i , one can make row e equal to e by modifying at most δn entries. A (δ, r)-column-clustering is defined analogously on the set of columns of M. The first conditional regularity lemma states the following. Lemma 4.1 [5] Let k be a fixed positive integer and let δ > 0 be a small real number. For every n × n binary matrix M for some n > (k/δ) O(k) , either M admits (δ, r)-clusterings for both the rows and the columns with r ≤ (k/δ) O(k) , or for every k × k binary matrix A, Let R be a set of rows and let C be a set of columns in an n × n matrix M. The block . . , R r } is a partition of the set of rows and C = {C 1 , . . . , C r } is a partition of the set of columns of M, such that all but a δ-fraction of the entries of M lie in blocks R i × C j that are δ-homogeneous. The second result that we need from [5], relating clusterings and partitions of a matrix, is as follows.
Lemma 4.2 [5] Let δ > 0 and let r be a fixed positive integer. If a square binary matrix M has (δ 2 /16, r)-clusterings R, C of the rows and the columns respectively then For the proofs of the above lemmas see [5]. We continue to the proof of Theorem 1.2. The following lemma is a crucial part of the proof.
Before providing the full proof of Lemma 4.3, we present a sketch of the proof. Clearly, whenever we apply Lemma 4.1 throughout the proof, we may assume that the outcome is that M has suitable row and column clusterings, as the other possible outcome of Lemma 4.1 finishes the proof immediately. The main idea of the proof is to gradually find row-separators, and then column separators, while maintaining a large set of pairwise disjoint copies of A that conform to these separators. This is done inductively (first for the rows, and then for the columns). The inductive step is described in what follows.
Assume we currently have j − 1 ≥ 0 row-separators, and a set A of many pairwise disjoint A copies that have their first j rows separated by these row-separators. We take a clustering of the rows of M, and consider a cluster in which many rows are "good", in the sense that they contain the j -th row of many of the disjoint A copies from A. We put our jth separator as the medial row among the good rows. Next, we consider a matching of pairs (r 1 , r 2 ) of good rows, where in each such pair r 1 lies before the j -th separator and r 2 lies after the j -th separator. Observe that all good rows lie after the (j − 1)-th separator.
If we take all pairwise-disjoint A copies from A whose j -th row is r 2 , and "shift" their j -th row to be r 1 , then most of them will still be A copies (as rows r 1 and r 2 are very similar, since they are in the same row cluster). This process creates a set A of many pairwise disjoint A copies whose i-th row lies between separators i − 1 and i for any i ≤ j , and the (j + 1)-th row lies after separator j . This finishes the inductive step.
We now continue to the full proof of Lemma 4.3.
Proof of Lemma 4.3 Let > 0 and let M be a large enough n × n binary matrix containing a collection U 0 of n 2 pairwise disjoint A copies.
We prove the following claim by induction on i, polynomial in −1 such that either M contains τ i n s+t copies of any s × t matrix or there exist 0 = x 0 < x 1 < . . . < x i and a set U i of δ i n 2 pairwise disjoint A copies in M whose j -th row is bigger then x j −1 and no bigger than x j for any 1 ≤ j ≤ i, and the (i + 1)-th row is bigger than x i . The base case i = 0 is trivial with δ 0 = . Suppose now that i ≥ 1 and that x 0 , . . . , x i−1 , δ i−1 and U i−1 are already determined. Applying Lemma 4.1 on M with parameters k = max{s, t} and δ i−1 /4, either M contains τ i n s+t copies of any The number of rows of M that contain the i-th row of at least δ i−1 n/2 of the copies of A in U i−1 is at least δ i−1 n/2, since the number of copies of A in U i−1 whose i-th row is not taken from such a row of M is less that n · δ i−1 n/2 = δ i−1 n 2 /2. Let R i be a row cluster that contains at least δ i−1 n/2r i such rows. Note that all of these rows are bigger than x i−1 . Take subclusters R 1 i , R 2 i of R i , each containing at least δ i−1 n/4r i ≥ δ i−1 n/5r i such rows (the inequality holds for n large enough) where each row in R 1 i is smaller than each row in R 2 i . Take x i to be the row index of the biggest row in R 1 i . Take arbitrarily δ i−1 n/5r i couples of rows (r, r ) where r ∈ R 2 i and r ∈ R 1 i and every row participates in at most one couple. Let (r, r ) be such a couple. There exist δ i−1 n/2 s × t submatrices of M that are A copies from U i−1 and whose i-th row is r. Moreover, for any j < i the j -th row of each of these submatrices lies between x j −1 (non-inclusive) and x j (inclusive). Since r and r differ in at most δ i−1 n/4 entries, there are at least δ i−1 n/4 such submatrices T that satisfy the following: If we modify T by taking its i-th row to be r instead of r, T remains an A copy. Moreover, after the modification, the i-th row of T is in R 1 i and is therefore no bigger than x i , whereas the (i + 1)-th row of T is bigger than the i-th row of T before the modification, which is bigger than x i , as needed. For every couple (r, r ) we can produce δ i−1 n/4 pairwise disjoint copies of A whose j -th row is between x j −1 and x j for any j ≥ i and the (i + 1)-th row is after x i . There are δ i−1 n/5r i such couples (r, r ), and in total we get a set U i of δ i n 2 copies of A with the desired structure for δ i = δ 2 i−1 /20r i where δ −1 i is polynomial in δ −1 i−1 and so in −1 . Note that the copies in U i are pairwise disjoint. In the end of the process there is a set U = U s of δ s n 2 pairwise disjoint copies of A whose rows are separated by X = {x 1 , . . . , x s−1 }. A feature that is useful in what follows is that each copy in U has exactly the same set of columns (as a submatrix of M) as one of the original copies of U 0 . Now we apply the same process as above but in columns instead of rows, starting with the δ s n 2 copies in U . In the end of the process, we obtain that for someτ t ,δ t such that τ −1 t andδ −1 t are polynomial in δ −1 s and so in −1 , either M containsτ t n s+t copies of any s × t matrix, or there exists a setÛ ofδ t n 2 pairwise disjoint copies of A whose columns are separated by a set of indices Y of size t − 1. Moreover, by the above feature, each of the copies inÛ has the same set of rows as some copy of A from U , so each copy has its rows separated by X. Hence X × Y separates all copies inÛ . Taking τ = min{τ t ,δ t } finishes the proof.
Next we show how Theorem 1.2 follows from Lemma 4.3. The idea of the proof is to show, using Lemmas 4.2 and 4.3, that there is a partition of M with blocks R i × C j (for 1 ≤ i ≤ s, 1 ≤ j ≤ t) satisfying the following.
• All row clusters R i and all column clusters C j are large enough.
• All rows of R i (C j ) lie before all rows (columns) of R i+1 (C j +1 ) for any i and j .
• R i × C j is almost homogeneous, and its "popular" value is A ij .
Using these properties, it is easy to conclude that M contains many copies of A. We now complete the proof of Theorem 1.2.
Proof of Theorem 1.2 Let A be an s × t binary matrix and let k = max{s, t}. Let > 0 and let M be a large enough n × n binary matrix that contains n 2 pairwise disjoint copies of A. Lemma 4.3 implies that either M contains τ n s+t copies of A where τ −1 is polynomial in −1 (in this case we are done), or M contains at least τ n 2 pairwise disjoint copies of A separated by X × Y for suitable index subsets X, Y . By Lemma 4.1 we get that either M has (τ 2 /256, r)-clusterings of the rows and the columns for some r that is polynomial in τ −1 and so in −1 , or at least a ζ = (τ 2 /256k) O(k 2 ) fraction of the s × t submatrices are A; in the second case we are done. Suppose then that M has (τ 2 /256, r)-clusterings R, C of the rows, columns respectively. The next step is to create refinements of the clusterings. Write the elements of X as x 1 < . . . < x s−1 and let x 0 = 0, x s = n. Partition each R ∈ R into s parts where the i-th part for i = 1, . . . , s consists of all rows in R with index at least x i−1 and less than x i . Each such part is also a τ 2 /256-cluster. Now separate each C ∈ C into t parts in a similar fashion. This creates (τ 2 /256, (r + 1)k)-clusterings R , C of the rows and the columns respectively (where some of the clusters might be empty). By Lemma 4.2, P = (R , C ) is a (τ/4, r )-partition of M where r = (r + 1)k + 1, and each block of the partition has all of its entries between two neighboring row-separators from X and between two neighboring column-separators from Y . There are at most τ n 2 /4 entries of M that lie in non-τ/4-homogeneous blocks of P and at most τ n 2 /4 entries of M that lie in τ/4-homogeneous blocks of P but do not agree with the value of the block. Therefore, the number of entries as above is no more than τ n 2 /2, and so there exists a set of τ n 2 /2 pairwise disjoint copies of A in M separated by X × Y in which all the entries come from τ/4-homogeneous blocks and agree with the value of the block in which they lie. Hence there exist sets of rows R 1 , . . . , R s ∈ R and sets of columns C 1 , . . . , C t ∈ C and a collection A of τ n 2 /2(r ) 2k pairwise disjoint copies of A separated by X × Y such that for any 1 ≤ i ≤ s, 1 ≤ j ≤ t, the block R i × C j is τ/4-homogeneous, has value A(i, j ), lies between row-separators x i−1 and x i and between column-separators y j −1 and y j , and contains the (i, j ) entry of any copy of A in A. This implies that |R i |, |C j | ≥ τ n/2(r ) 2k for any 1 ≤ i ≤ s and 1 ≤ j ≤ t, So there are (τ/2(r ) 2k ) s+t n s+t s × t submatrices of M whose (i, j ) entry lies in R i × C j for any i, j . Picking such a submatrix S at random, the probability that S(i, j ) = A(i, j ) for a specific couple i, j is at most τ/4; thus S is equal to A with probability at least 1 − stτ/4 > 1/2 for small enough τ . Hence the number of copies of A in M is at least(τ/2(r ) 2k ) s+t n s+t /2.
We now turn to the proof of Theorem 1.5. Recall the definition of an unfoldable matrix and a folding of a matrix from Section 3. A family of matrices is unfoldable if all matrices in it are unfoldable. The folding of a finite family F of matrices is the setF = {Ã : A ∈ F} of the foldings of the matrices in F. Observe thatF is unfoldable for any family F. Note that if F is closed under (row) permutations thenF is also closed under (row) permutations.
We start with a short sketch of the proof, before turning to the full proof. As before, we may assume that our matrix M has a row clustering with suitable parameters. We may also assume that the forbidden family is unfoldable. Consider a submatrix Q of M that contains exactly one "representative" row from any large enough row cluster. The crucial idea is that if Q does not contain many A copies, then M is close to F-freeness. Indeed, one can modify all rows in M to be equal to rows from Q without making many entry modifications, and after this modification, it is possible to eliminate all copies of matrices from F in M (without creating new F copies) by only modifying those columns in M that participate in some F copy in Q; if Q does not contain many F copies then the number of such columns is small. Since the above statement is true for any possible choice of Q, we conclude that if M is -far from F-freeness then it must contain many copies of A.
Proof of Theorem 1.5 It suffices to prove the statement of the theorem only for unfoldable families that are closed under row permutations. Indeed, suppose that Theorem 1.5 is true for all unfoldable families that are closed under row permutations. Let F be a family of binary matrices that is closed under row permutations and letF be its folding. Then for any > 0 there existsδ > 0 such that any square binary matrix M which is -far from F-freeness containsδn s +t copies of some s × t matrix B ∈F , whereδ −1 is polynomial in −1 . Provided that M is large enough (i.e., assuming that M is an n × n matrix where n ≥ n 0 for a suitable choice of n 0 that depends on F and is polynomial in −1 ), we can apply Lemma 3.1 to get that M also contains δn s+t copies of the matrix A ∈ F whose folding is B, for a small enough δ > 0 where δ −1 is polynomial in −1 .
Suppose then that F is an unfoldable finite family of binary matrices that is closed under row permutations. Let k be the maximal row or column dimension of a matrix from F. Let > 0 and apply Lemma 4.1 with parameters k and /6. Let M be a large enough n × n matrix with n > (k/ ) O(k) ; Then either M contains δ 2 n 2k copies of any k × k matrix, where δ −1 2 is polynomial in −1 , or M has an ( /6, r)-clustering of the rows with r polynomial in −1 . In the first case we are done, so suppose that M has an ( /6, r)-clustering R = {R 0 , . . . , R r } of the rows where R 0 is the error cluster.
Suppose that M is -far from F-freeness. We say that a cluster R = R 0 in R is large if it contains at least n/6r rows, and small otherwise. Let R denote the collection of all large clusters. Note that the total number of rows that do not lie in large clusters is less than n/6 + n/6 = n/3. Indeed, since R is an ( /6, r)-clustering, the number of rows in the error cluster R 0 is at most n/6, and there are at most r small clusters, each of which contains less than n/6r rows.

Claim 4.4 Let Q be a submatrix of M, created by arbitrarily picking exactly one row from each large cluster in R . Then Q contains a collection A(Q) of more than n/3k pairwisedisjoint copies of matrices from F.
Proof For every large cluster R ∈ R , let ρ(R) denote the unique row of Q taken from R. Suppose to the contrary that |A| ≤ n/3k and let C be the set of all columns of M that intersect a copy from A; Then C contains no more than n/3 columns. We can modify M to make it F-free as follows: First modify every row that lies in a large cluster R ∈ R to be equal to ρ(R). Then pick some row ρ 0 of Q and modify all rows that are not contained in large clusters to be equal to ρ 0 . Finally do the following: As long as C is not empty, pick a column c ∈ C that has a neighbor (predecessor or successor) not in C and modify c to be equal to its neighbor, and then remove c from C.
Since F is unfoldable and closed under row permutations, after these modifications M is F-free. Indeed, after the first and the second steps, all rows of M are equal to rows from Q; the order of the rows does not matter since F is closed under row permutations. Now each time that we modify a column c ∈ C in the third step, all copies of matrices from F that intersect it are destroyed, and no new copies are created. By the maximality of A, any copy of a matrix from F in the original Q intersected some column from C, so we are done. The number of entry modifications needed in the first, second, third step respectively is at most n 2 /6, n 2 /3, n 2 /3 and thus by making only 5 n 2 /6 modifications of entries of M we can make it F-free, contradicting the fact that M is -far from F-freeness.
At this point, the proof of Theorem 1.5 can be completed by combining Claim 4.4 with Theorem 1.2. Indeed, one can produce n/6r pairwise-disjoint submatrices Q 1 , . . . , Q n/6r , so that each Q i contains exactly one row from any large cluster. Now applying Claim 4.4 for each Q i , we get that each such Q i contains at least n/3k pairwisedisjoint copies of matrices from F. Since the Q i are pairwise-disjoint, this implies that M contains at least n 6r · n 3k = 2 18kr n 2 pairwise-disjoint copies of matrices from F. The proof is concluded by applying Theorem 1.2.
However, the relatively heavy machinery used in the proof of Theorem 1.2 -including multiple applications of the efficient regularity lemma, Lemma 4.1 above -is not really needed here. We now show how to finish the proof of Theorem 1.5 using more elementary tools. Given any Q as in the statement of Claim 4.4, there exist an s×n submatrix T = T (Q) of Q and an s × t matrix A(Q) ∈ F such that at least n/(3k|F |r s ) of the copies in A(Q) are copies of A that lie in T . The following elementary removal lemma implies that T contains many copies of A. Proof Let > 0 and let T be a large enough s × n matrix containing n pairwise disjoint copies of A. We construct t disjoint subcollections A 1 , . . . , A t of A, each of size n/2t ≤ n/t , such that for any i < j, all copies in A i are i-column-smaller than all copies in A j . This is done by the following process for i = 1, . . . , t. Take A i to be the set of the n/2t ismallest copies in A and delete these copies from A. Now observe that any s × t submatrix of T that takes its i-th column (for i = 1, . . . , t) as the i-th column of some copy from A i is equal to A. There are ( n/2t) t such submatrices among all n t ≤ n t s × t submatrices of T , and so T contains ( /2t) t n t copies of A.
Claim 4.5 implies that any Q defined as in the statement of Claim 4.4 contains γ n t copies of some s × t matrix A(Q) ∈ F, where γ −1 is polynomial in ( /3k|F |r s ) −1 and so in −1 . Next we show how to use this to complete the proof of Theorem 1.5. From any large cluster R in R arbitrarily pick a subcluster R * of size exactly n/6r, and let R * be the collection of all of these subclusters. Define r = |R | = |R * | and let Q denote the collection of all r × n submatrices Q having exactly one row in each R * ∈ R * , and no rows elsewhere.
Following the above discussion, we know that there exists an s × t matrix A ∈ F for which at least a |F | −1 -fraction of the matrices Q ∈ Q contain γ n t copies of A (note that s ≤ r must hold). Let Q be the collection of all such Q containing γ n t copies of A. We claim that the probability that a uniformly random s × t submatrix of M equals A is at least δ, where δ −1 is polynomial in −1 . Showing this shall complete the proof of Theorem 1.5.
As a first step, we show that sufficiently many of the s × t submatrices of M have at most one row in each subcluster R * ∈ R * , and zero rows elsewhere. Claim 4.6 Let X be a set of s rows of M, picked uniformly at random. Consider the event that |X ∩ R * | ≤ 1 for any subcluster R * ∈ R * , and R * ∈R * |X ∩ R * | = s. The probability of this event is at least α, where α −1 is polynomial in −1 .
Proof As we have seen, the number of rows of M not in large clusters is less than n, so r ≥ 1 must hold. Let R * 1 , . . . , R * s be an arbitrary s-tuple of different subclusters in R * (recall that s ≤ r ). Suppose that the rows of X are picked one by one (without repetitions).
Consider the event that for any 1 ≤ i ≤ s, the i-th row of X is picked from R * i . This event implies (i.e., is contained in) the event appearing in the statement of the claim, and its probability is at least ( /6r) s ≥ ( /6r) k . The last expression is polynomial in , as desired.
Let S be a uniformly random s × t submatrix of M. By the last claim, the probability that all rows of S are taken from R * ∈R * R * , where at most one row is taken from each R * , is at least some α > 0 polynomial in . Conditioning on this event, we may assume that S is generated in the following way. First, exactly one row is picked randomly from each R * to generate a uniformly random submatrix Q ∈ Q, and then S is picked as a uniformly random s × t submatrix of Q. Since Q is of dimensions r × n where r ≤ r, it has at most n t r s submatrices of dimensions s × t. Now if Q ∈ Q (this event has probability at least 1/|F |), then Q contains γ n t copies of A. The probability that S equals A in this case is hence at least β = (γ n t )/(n t r s ) = γ r −s . We conclude that the probability that a randomly picked s × t submatrix S of M equals A is at least δ = αβ/|F |. α −1 and β −1 are polynomial in −1 , and therefore, δ −1 is polynomial in −1 as well.

Multi-Dimensional Matrices over Arbitrary Alphabets
As opposed to the polynomial dependence in the above results on binary matrices, Fischer and Rozenberg [17] showed that in analogous results for ternary matrices, as well as binary three-dimensional matrices, the dependence is super-polynomial in general. The proof builds on a construction of Behrend [10]. For the ternary case, it gives the following.
Theorem 5.1 [17] There exists a (finite) family F of 2 × 2 binary matrices that is closed under permutations and satisfies the following. For any small enough > 0, there exists an arbitrarily large n × n ternary matrix M that contains n 2 pairwise-disjoint copies of matrices from F, yet the total number of submatrices from F in M is no more than −c log n 4 where c > 0 is an absolute constant.
Theorem 5.1 implies that an analogue of Theorem 1.2 with polynomial dependence cannot be obtained when the alphabet is bigger than binary, even when F is a small finite family that is closed under permutations. In Section 5.1 we describe another construction that establishes Theorem 5.1, which is slightly simpler than the original construction in [17].
In what follows, we focus on the problem of finding a "weak" removal lemma analogous to Theorem 1.2 for matrices in more than two dimensions over an arbitrary alphabet. Here we do not try to optimize the dependence between the parameters, but rather to show that such a removal lemma exists. Note that in two dimensions this removal lemma follows from Theorem 1.1, but our results here suggest a direction to prove a weak high dimensional removal lemma without trying to generalize the heavy machinery used in [2] to the high dimensional setting. Our main result here states that this problem is equivalent in some sense to the problem of showing that if a hypermatrix M contains many pairwise-disjoint copies of a hypermatrix A, then it contains a "wide" copy of A; more details are given later. In what follows, we use the term d-matrix to refer to a matrix in d dimensions. An (n, d)-matrix is a d-matrix whose dimensions are n × · · · × n.
A weak removal lemma for families of d-matrices that are closed under permutations follows easily from the hypergraph removal lemma [21,23,24,29] using a suitable construction. d coordinates). For any > 0 there exists δ > 0 such that the following holds. If an (n, d)-matrix M over contains n d pairwise disjoint copies of d-matrices from F, then M contains δn s 1 +···+s d copies of some s 1 ×. . .×s d matrix A ∈ F.

Proposition 5.2 Let be an arbitrary alphabet and let F be a finite family of d-matrices over that is closed under permutations (in all
Note that Theorem 5.1 implies that the dependence of δ −1 on −1 in Proposition 5.2 cannot be polynomial. The question whether the statement of Proposition 5.2 holds for any finite family F is open for d-matrices with d > 2. Here we state the question in the following equivalent but simpler form. Our main theorem in this domain shows that Problem 5.3 is equivalent to another statement that looks more accessible. We need the following definition to describe it.
and let S be the submatrix of M on the indices The proofs of the statements here are given, for simplicity, only for two dimensional matrices, but they translate directly to higher dimensions. The only major difference in the high dimensional case is the use of the hypergraph removal lemma instead of the graph removal lemma.
We start with the (simple) proof of Proposition 5.2. The proof uses the non induced graph removal lemma. Some definitions are required for the proof. An s × t reordering σ is a permutation of Proof of Proposition 5.2 Let k(F ) denote the largest row or column dimension of matrices from F. Let > 0 and let M be an n × n matrix over that contains n 2 pairwise-disjoint copies of matrices from F. In particular, there is an s ×t matrix A ∈ F such that M contains n 2 /|F | pairwise-disjoint copies of A.
We construct an (s + t)-partite graph G on (s + t)n vertices as follows: There are s row parts R 1 , . . . , R s and t column parts  A(a, b).
We now show that there exists a bijection between copies of K s+t in G and couples (S, σ ) where S is an s × t submatrix of M and σ is an s × t reordering such that σ (S) = A. Indeed, take the following mapping: A couple (S, σ ), where S is the submatrix of M on {a 1 , . . . , a s } × {b 1 , . . . , b t } with a 1 < . . . < a s and b 1 < . . . < b t and σ = σ 1 × σ 2 , is mapped to the induced subgraph of G on {r It is not hard to see that (S, σ ) is mapped to a copy of K s+t if and only if σ (S) is equal to A. On the other hand, every copy of K s+t in G has exactly one vertex in each row part and in each column part, and there exists a unique couple (S, σ ) mapped to it.
There exist n 2 /|F | pairwise-disjoint copies of A in M that are mapped (with the identity reordering) to edge-disjoint copies of K s+t in G. By the graph removal lemma, there exists δ > 0 such that at least a δ-fraction of the subgraphs of G on s + t vertices are cliques. Therefore, at least a δ-fraction of the possible couples (S, σ ) (where S is an s × t submatrix of M and σ is an s × t reordering) satisfy σ (S) = A, concluding the proof.
Next we give the proof of Theorem 5.4. We may and will assume throughout the proof that M is an n × n matrix where n is large enough with respect to . The terms i-height and j -width correspond to (1, i)-width and (2, j)-width, respectively, in the definition given before the statement of Theorem 5.4.
Proof of Theorem 5. 4 We start with deriving Statement 2 from Statement 1; this direction is quite straightforward, while the other direction is more interesting. Fix an s × t matrix A. Let > 0 and assume that Statement 1 holds. There exists δ = δ( ), such that if M contains n 2 pairwise-disjoint copies of A then it contains δn s+t copies of A. To prove Statement 2 we can pick δ = δ ( ) > 0 small enough such that for any large enough n × n matrix M, any 1 ≤ i ≤ s − 1 and any 1 ≤ j ≤ t − 1, the fraction of s × t submatrices with i-height (or j -width) smaller than δ among all s × t submatrices is at most δ/2. Fix an 1 ≤ i ≤ s − 1. This choice of δ implies that any matrix M containing n 2 pairwise disjoint copies of A also contains a copy with i-height at least δ . Similarly, for any 1 ≤ j ≤ t − 1, there is a copy of A with j -width at least δ .
Next we assume that Statement 2 holds and prove Statement 1. Fix an s ×t matrix A over an alphabet , let > 0 and let M be a large enough n × n matrix containing a collection A 0 of n 2 pairwise disjoint copies of A. We show that there exist * > 0 that depends only on , sets X, Y of row-separators and column-separators respectively, of sizes s − 1 and t − 1, and a collection of * n 2 disjoint A copies separated by X × Y in M. Then we will combine a simpler variant of the construction used in the proof of Proposition 5.2 with the graph removal lemma to show that M contains δn s+t copies of A for a suitable δ( ) > 0.
The number of copies of A in M does not depend on the alphabet, so we may consider A and M as matrices over the alphabet = ∪{α} for some α / ∈ , even though all symbols in A and M are from . Without loss of generality we assume that no two entries in A are equal.
Let X 0 = φ, 0 = and let M 0 be the following n × n matrix over : All A copies in A 0 (which was defined in the previous paragraph) appear in the same locations in M 0 , and all other entries of M 0 are equal to α. Clearly, any copy of A in M 0 also appears in M. Next, we construct iteratively for any i = 1, . . . , s − 1 an n × n matrix M i over that contains a collection A i of i n 2 pairwise disjoint copies of A where i > 0 depends only on i−1 , such that all A copies in M i also exist in M i−1 . We also maintain a set X i of row separators whose elements are x 1 < . . . < x i , such that any entry of M i between x j −1 and x j for j = 1, . . . , i (where we define x 0 = 0, x s = n) is either equal to one of the entries of the j -th row of A or to α.
The construction of M i given M i−1 is done as follows. By Statement 2, there exists δ i = δ i ( i−1 ) such that any matrix M over containing at least i−1 n 2 /2 copies of A also contains a copy of A with i-height at least δ i . We start with a matrix M equal to M i−1 and an empty A i , and as long as M contains a copy of A with i-height at least δ i , we add it to A i and modify (in M ) all entries of all A copies from A i−1 that intersect it to α. By the separation that X i−1 induces on M , each such copy has its j -th row between x j −1 and x j for any 1 ≤ j ≤ i − 1.
This process might fail at step i only when at least i−1 n 2 /2 of the copies from A i−1 in M have one of their entries modified. Since in each step at most st copies of A are deleted from M , in the end A i contains at least i−1 n 2 /2st pairwise disjoint copies of A with iheight at least δ i . Pick uniformly at random a row index x i > x i−1 . The probability that a certain copy of A in A i has its i-th row at or above x i and its (i + 1)-th row below x i is at least δ i . Therefore, the expected number of A copies in A i with this property is at least i n 2 with i = δ i i−1 /2st, so there exists some x i such that at least i n 2 copies of A in A i have their first i + 1 rows separated by X i = X i−1 ∪ {x i }; delete all other copies from A i . We construct M i as follows: All copies of A from A i appear in the same locations in M i , and all other entries of M i are equal to α.
After iteration s − 1 we have a matrix M s−1 with s−1 n 2 copies of A separated by X = X s−1 . We apply the same process in columns instead of rows, starting with the matrix M s−1 . The resulting matrix M * contains * n 2 pairwise disjoint copies of A separated by X × Y where Y consists of the column-separators y 1 < . . . < y t−1 , * depends on , and M * only contains copies of A that appeared in the original M.
Finally, construct an (s + t)-partite graph G on 2n vertices as follows: The row parts are R 1 , . . . , R s and the column parts are C 1 , . . . , C t where R i (C i ) contains vertices labeled x i−1 + 1, . . . , x i (y i−1 + 1, . . . , y i respectively) with x 0 = y 0 = 0, x s = y t = n. Any two row (column) vertices not in the same part are connected. Vertices a ∈ R i , b ∈ C j are connected if and only if M * (a, b) = A(i, j ). Clearly there exists a bijection between copies of A in M * and K s+t copies in G that maps disjoint A copies to edge-disjoint K s+t copies in G, so it contains * n 2 edge disjoint (s + t)-cliques. By the graph removal lemma there exists δ = δ( * ) > 0 such that a δ-fraction of the subgraphs of G on s + t vertices are cliques. Hence at least a δ-fraction of the s × t submatrices of M are equal to A.

Lower Bound
In this subsection we give an alternative constructive proof of Theorem 5.1. Our main tool is the following result in additive number theory from [1], based on a construction of Behrend [10]. and observe that F is closed under permutations. Let m be a positive integer divisible by 10 and let X ⊆ [m/10] be a subset with no non-trivial solution to the equation x 1 + x 2 + x 3 = 3x 4 that is of maximal size. We construct the following m × m ternary matrix M. For any 1 ≤ i ≤ m/5 and any x ∈ X we put a copy of A in M as follows: We To see this, suppose that the rows of an A copy in M are i and j + n/2 for some 1 ≤ i, j ≤ n/2, then there exist x 1 , x 2 , x 3 , x 4 ∈ X such that the entries of the copy were taken from locations (i, i + x 1 ), (i, m/2 + i + 3x 2 ), (m/2 + j, j − x 3 ), (m/2 + j, m/2 + j + x 4 ) in M and so we have i + x 1 = j − x 3 and i + 3x 2 = j + x 4 . Reordering these two equations we get that 3x 2 = x 1 + x 3 + x 4 , implying that x 1 = x 2 = x 3 = x 4 and j = i + 2x 1 , so the above A copy is indeed in A.
Let n be an arbitrarily large positive integer divisible by m. Given M as above, we create an n × n 'blowup' matrix N as follows: For any 1 ≤ i, j ≤ n, N(i, j) = M( im/n , jm/n ). N can also be seen as the result of replacing any entry e in M with an n/m × n/m matrix of entries equal to e. The total number of A copies in N is exactly (n/m) 4 q = n 4 |X|/5m 3 , whereas the maximum number of pairwise disjoint A copies in N is exactly (n/m) 2 q = n 2 |X|/5m. Assuming that > 0 is small enough and picking m to be the smallest integer divisible by 10 and larger than c log for a suitable absolute constant c > 0 gives that |X|/5m > , but the number of A copies in N is at most n 4 |X|/5m 3 ≤ n 4 /m 2 < −c log n 4 as needed.

Concluding Remarks
Generally, understanding property testing seems to be easier for objects that are highly symmetric. A good example of this phenomenon is the problem of testing properties of (ordered) one-dimensional binary vectors. There are some results on this subject, but it is far from being well understood. On the other hand, the binary vector properties P that are invariant under permutations of the entries (these are the properties in which for any vector v that satisfies P , any permutation of the entries of v also satisfies P ) are merely those that depend only on the length and the Hamming weight of a vector. This makes the task of testing these properties trivial.
A central example of the symmetry phenomenon is the well investigated subject of property testing in (unordered) graphs, that considers only properties of functions from [n] 2 to {0, 1} that are invariant under permutations of [n] 2 induced by permutations on [n]. That is, if a labeled graph G satisfies some graph property, then any relabeling of its vertices results in a graph that also satisfies this property. Indeed, the proof of the only known general result on testing properties of ordered graphs (here the functions are generally not invariant under permutations), given in [2], is substantially more complicated than the proof of its unordered analogue. See [27] for further discussion on the role of symmetries in property testing.
In general, matrices (with row and column order) do not have any symmetries. Therefore, the above reasoning suggests that proving results on the testability of matrix properties is likely to be harder than proving similar results on properties of matrices where only the rows are ordered (such properties are invariant under permutations of the columns), which might be harder in turn than proving the same results for properties of matrices without row and column orders, i.e. bipartite graphs, as these properties are invariant under permutations of both the rows and the columns. Theorem 1.2 is a weak removal lemma for binary matrices with row and column order, while Theorem 1.3 is an induced removal lemma for binary matrices without row and column order, and our generalization of it, Theorem 1.5, is an induced removal lemma for binary matrices with a row order but without a column order. It will be very interesting to settle Problem 1.4, that asks whether a polynomial induced removal lemma exists for binary matrices with row and column orders.
It will be interesting to expand our knowledge of matrices in higher dimensions and of ordered combinatorial objects in general. Proposition 5.2 is a non-induced removal lemma for (multi-dimensional) matrices without row and column orders. It will be interesting to get results of this type for less symmetric objects, ultimately for ordered multi-dimensional matrices. We believe that providing a direct solution (that does not go through Theorem 1.1) for the following seemingly innocent problem is of interest, and might help providing techniques to help settling Problem 5.3 in general. In what follows, the height of a 2 × 2 submatrix S in an n × n matrix M is the difference between the indices of the rows of S in M, divided by n. The three dimensional analogue of this problem is obviously also of interest. Here Theorem 1.1 cannot be applied, so currently we do not know whether such a δ = δ( ) that depends only on exists. Solving the three-dimensional analogue will settle Problem 5.3 when the forbidden hypermatrix has dimensions 2 × 2 × 2, and the techniques might lead to settling Problem 5.3 in its most general form.
As a final remark, in the results in which δ −1 is polynomial in −1 we have not tried to obtain tight bounds on the dependence, and it may be interesting to do so.