Geometric Pattern Matching Reduces to k-SUM

We prove that some exact geometric pattern matching problems reduce in linear time to $k$-SUM when the pattern has a fixed size $k$. This holds in the real RAM model for searching for a similar copy of a set of $k\geq 3$ points within a set of $n$ points in the plane, and for searching for an affine image of a set of $k\geq d+2$ points within a set of $n$ points in $d$-space. As corollaries, we obtain improved real RAM algorithms and decision trees for the two problems. In particular, they can be solved by algebraic decision trees of near-linear height.


Introduction
The k-SUM problem is a fixed-parameter version of the NP-complete SUBSET SUM problem. It consists of deciding, given a set of n numbers, whether any subset of size k sum to zero. The problem for k = 3, known as 3-SUM, is now a well-established bottleneck problem in fine-grained complexity theory (see for instance [1,28] and references therein). While there are many reductions showing 3-SUM-or k-SUM-hardness of computational problems in geometry, only few reductions to 3-SUM and k-SUM are known. We give examples of computational geometry problems that reduce to 3-SUM or k-SUM. Our results are motivated by the nontrivial improved upper bounds on the complexity of 3-SUM and k-SUM proven in the recent years. While it has long been conjectured that no subquadratic algorithm for 3-SUM existed, it is now known to be solvable in time O((n 2 / log n)(log log n) 2 ) in the real RAM model, and in time O((n 2 / log 2 n)(log log n) O(1) ) if we allow bitwise operations on fixed-length words [25,20,22,14]. The existence of an O(n 2−δ ) algorithm for some δ > 0 remains an open problem. Using folklore meet-inthe-middle algorithms, k-SUM can be solved in time O(n k/2 ) if k is odd, and in time O(n k/2 log n) if k is even. Recently, Kane, Lovett, and Moran [26] showed that it can be solved in time O(n log 2 n) in the linear decision tree model, improving on previous polynomial bounds [13,19]. Table 1 Known upper bounds on the time complexity of exact geometric pattern matching in various settings (taken from [11] and [23], Chapter 54). We indicate the dependency on the pattern size k.

Transformations Dimension
Complexity congruence 2 O(kn 4/3 log n) [11] [11] Geometric pattern matching We consider two problems involving searching for a given set P of k points, called the pattern, within a larger set S of points, up to some geometric transformation. Here we focus on exact algorithms, in which the pattern must match the subset of points exactly. We consider the following two problems.

Problem 1 (SIMILARITY MATCHING).
For a fixed integer k ≥ 3, given a set P of k points in the plane and a set S of n points in the plane, determine whether S contains the image of P under a similarity transformation. Problem 2 (AFFINE MATCHING). For fixed integers d ≥ 2 and k ≥ d + 2, given a set P of k points in R d containing d + 1 affinely independent points, and a set S of n points in R d , determine whether S contains the image of P under an affine transformation.
A large body of the computational geometry and pattern recognition literature is dedicated to the problems of finding approximate matches up to some geometric transformation, where the quality of the approximation is typically measured by the Hausdorff distance [15,24,21,6]. For exact pattern matching problems under different families of transformations, known upper bounds on time complexity have been compiled in a survey by Peter Braß [11]. We reproduce them in Table 1.
The complexity of these algorithms are directly related to bounds on the maximum number of occurrences of a pattern or a distance in a set of n points. In fact, such bounds directly yield a lower bound on the computational problem of listing all occurrences of the pattern. A prototypal example is Erdős' unit distance problem; see Braß and Pach [12] for more examples. It is known, in particular, that there can be Θ(n 2 ) similar copies of a pattern in an n-point set [18,3,4]. Structural results on the extremal point sets are also known [2]. For affine transformations in R d , there exist pairs P, S such that S contains Θ(n d+1 ) copies of P : for instance the d-dimensional lattice {1, 2, . . . , n 1/d } d contains Θ(n d+1 ) affine images of a cube.

Our results
We suppose we can perform exact computations over the reals. Therefore, all the algorithms that we consider are either uniform algorithms in the real RAM model, or nonuniform algorithms in the algebraic decision tree model.
Our main result is the following. We refer the reader to the exact definitions of the k-SUM problem and the notion of randomized linear-time reduction given later. Theorem 1 has a number of consequences. Let us consider the special case of the SIMILARITY MATCHING problem in which k = 3.

Problem 3 (TRIANGLE)
. Given a triangle ∆ and a set S of n points in the plane, determine whether S contains three points whose convex hull is similar to ∆.
Combining the reduction provided by Theorem 1 with the real RAM algorithm for 3-SUM from Chan [14], we obtain the following.

Corollary 2.
There exists an O((n 2 / log n)(log log n) 2 ) randomized real RAM algorithm for TRIANGLE. In particular, there exists a subquadratic algorithm to detect equilateral triangles in a point set.
This contrasts with our current knowledge on the related 3-SUM-hard problem of finding three collinear points, also known as GENERAL POSITION TESTING. Despite recent attempts [10,14], it is still an open problem to find a subquadratic algorithm for GENERAL POSITION TESTING.
Our next corollary is obtained directly from known algorithms for k-SUM. It improves on the best known O(n d+1 log n) algorithm whenever k < 2(d + 1).

Corollary 3. There exists an
Finally, we consider the nonuniform decision tree complexity, also known as query complexity, of the two problems. By applying a recent result of Kane, Lovett, and Moran [26], we can bound the number of algebraic tests that are required to detect copies of P in an input set S. In fact, if the pattern P is a fixed parameter, that is, when P is not part of the input, but known at the algorithm design time, then the decision tree in the statement above only involves linear tests.

Corollary 5.
There exist randomized linear decision trees of height O(n log 2 n) for the fixed-parameter versions of SIMILARITY MATCHING and AFFINE MATCHING, in which P is a fixed parameter of the problems.
In a recent paper, Aronov, Ezra, and Sharir [8] study the following problem: Given three sets A, B, C of n points in the plane, decide whether there exists (a, b, c) ∈ A × B × C that simultaneously satisfies two real polynomial equations. They provide a subquadratic upper bound on the algebraic decision tree complexity of this problem. In a preliminary version of their paper [9] (version 2, Corollary 4.4), they considered the TRIANGLE problem as a special case of this problem. This version also contains a proof that TRIANGLE is 3-SUM-hard. As our result shows, it turns out that this special case is in fact much easier than the general problem, as the two polynomial equations can be made linear. Hence TRIANGLE is actually linear-time equivalent to 3-SUM, and its decision tree complexity is near-linear. We refer to [8,9] for a thorough discussion of the relation between these and other related problems.

Plan
In the next section, we define a number of variants of the k-SUM problem and prove they are all equivalent in the computation model we consider. In Section 3, we prove our main result for SIMILARITY MATCHING. Section 4 considers the AFFINE MATCHING problem. The last section is dedicated to the proof of Corollaries 4 and 5.

Linear degeneracy testing
We first give a definition of the k-SUM problem. Here, k ≥ 3 is a fixed integer, and X is a ring.
Our next problem is often referred to as linear degeneracy testing [7,17]. We consider the cases where X = R or C with the usual addition and multiplication operations, or where X = R d or C d for some integer d ≥ 2, with the vector addition and Hadamard (entrywise) product defined by (uv) i = u i v i . In the latter cases, the all-zero vector is denoted by 0, and the all-one vector by 1.
We make two observations. First, these are fixed-parameter problems: the integer k is part of the definition of the problem, not of the input. The same can be assumed for the function f . Such parameters will be referred to as fixed in what follows. Another observation is that using the Hadamard product in the definition of the function f allows us to combine conditions on the sought k-tuples: In the ring X, searching for k-tuples that simultaneously satisfy d linear equations can be cast as k-LDT(X d ).
It is clear that k-SUM is the special case of k-LDT in which β 0 = 0 and β i = 1 for 1 ≤ i ≤ k. On the other hand, k-LDT is not harder than k-SUM. In what follows, we say that a problem A reduces to problem B in randomized g(n) time if there exists an algorithm in the real RAM model with access to random real numbers in [0, 1] that maps any instance of size n of A to an equivalent instance of B in time O(g(n)) with probability 1.
Over the reals, the vector and scalar versions of k-SUM are also essentially equivalent, up to such a randomized reduction. Proof. Given an instance {A 1 , . . . , A k } of k-SUM(R d ), pick a uniform random unit vector v ∈ R d (see for instance Chapter V in Devroye's classical textbook [16] for the generation of random vectors on the unit hypersphere) and consider the sets where a·v is the usual dot product. They form an instance of k-SUM(R) such that any solution to the original instance of k-SUM(R d ) is also a solution. In the other direction, suppose there is a k-tuple (a 1 , . . . , a k (a 1 , . . . , a k ) is a solution of the instance {A 1 , . . . , A k } of k-SUM(R d ) with probability 1.
We also make the following simple observation:

Searching for a similar copy
Recall that in the TRIANGLE problem, we want to determine whether an input set S of n points in the plane contains three points whose convex hull is similar to a given triangle ∆. The short proof of the following result uses the interpretation of points in the plane as complex numbers, an idea that was exploited in a combinatorial context before [18,27].

Lemma 9. ANGLE reduces in linear time to 3-SUM(C).
Proof. Let u = re iθ be such that the three numbers 0, 1, u are the vertices of a triangle similar to ∆ in the complex plane. Recall that multiplying by re iθ has a geometric interpretation in the complex plane as scaling by a factor r and rotating by an angle θ. Recall that TRIANGLE is also known to be 3-SUM-hard [9], hence it is actually linear-time equivalent to 3-SUM. Our result generalizes naturally to larger patterns.

Lemma 11. SIMILARITY MATCHING reduces in linear time to k-SUM(C k−2 ).
Proof. Let u 1 , . . . , u k−2 ∈ C be such that the set Q = {0, 1, u 1 , . . . , u k−2 } is similar to P in the complex plane. Then k numbers a 1 , . . . , a k ∈ C form a similar copy of Q in the complex plane, with a 1 mapped to 0, a 2 to 1, and so on, if and only if a i − a 1 = u i−2 (a 2 − a 1 ) for all 3 ≤ i ≤ k. These are k − 2 linear equations on the k complex numbers a 1 , . . . , a k , hence SIMILARITY MATCHING reduces in linear time to k-LDT(C k−2 ). From Lemma 6, it reduces in linear time to k-SUM(C k−2 ).
Again, combining with Observation 8 and Lemma 7, we obtain the first statement of Theorem 1.

Searching for an affine image
We now prove the analogous result for the affine case. As a warm-up, we first consider the following simpler special case of AFFINE MATCHING in which the pattern is a square. Four points form the affine image of vertices of a square if and only if they are the vertices of a (possibly degenerate) parallelogram. Hence the problem can be cast as follows.
Problem 6 (PARALLELOGRAM). Given a set S of n points in the plane, determine whether S contains four points whose convex hull is a parallelogram.
The general case follows from the following observation. Consider a matrix Q ∈ R n×n , and let Q k denote the matrix obtained from Q by replacing its kth column by the column vector x T , where x 1 , x 2 , . . . , x n are variables. Then det Q k is a linear combination of x 1 , x 2 , . . . , x n , with coefficients defined by Q.

Lemma 14. AFFINE MATCHING reduces in linear time to
Proof. We use the notation [k] := {1, 2, . . . , k}. Let p i = (p i,1 , . . . , p i,d ) be a row vector representing the ith point of P . From the problem definition, P must contain d + 1 affinely independent points. Since we suppose k and d fixed, these points can be determined in constant time. We therefore assume without loss of generality that they are the first d + 1 points p 1 , . . . , p d+1 . Let A = {a 1 , . . . , a k } ∈ S k be a candidate match. In order for the set A to be the image of P under an affine transformation, there must be a solution to the system of k linear equations of the form p i F + t = a i for all i ∈ [k], with d 2 + d real unknowns F ∈ R d×d and t ∈ R d . The system can be decomposed into d systems, one for each coordinate j ∈ [d]. Each consists of k equations with d + 1 unknowns, of the form p i F j + t j = a ij for i ∈ [k], where F j is the jth column of F . We consider one such system, for a fixed j ∈ [d], and restrict it to the first d + 1 equations only: Since the first d + 1 points of P are affinely independent, Q is invertible and the system defines a unique solution for the coefficients F j and t j of the affine transformation. From Cramer's rule, the value of the kth unknown is the ratio det Q k / det Q, where Q k is the matrix obtained by replacing the kth column of Q by (a 1,j , . . . , a d+1,j ) T . From the above observation and the fact that Q does not depend on S, the expressions det Q k / det Q are linear combinations of the values a 1,j , . . . , a d+1,j , with coefficients determined by P . Hence the explicit solution for the coefficients F j and t j are linear combinations of the a 1,j , . . . , a d+1,j .
A necessary and sufficient condition for the set A to be a match is that the remaining k − d − 1 points of A are also images of the corresponding points in P . Hence we require that for all i > d + 1 the ith equation p i F j + t j = a ij is also satisfied by this solution. The unknowns F j and t j can be replaced by linear combinations of a 1,j , . . . , a d+1,j . Hence we obtain a set of k − (d + 1) linear equations on the variables a 1,j , . . . , a k,j , with coefficients depending on P .
Since these k − (d + 1) equations must hold for all coordinates j ∈ [d] simultaneously, we obtain that AFFINE MATCHING reduces to k-LDT(R ) with = d(k −(d+1)). From Lemma 6 it also reduces to k-SUM(R ). Since d and k are fixed, the reduction takes linear time.
Combining with the randomization step in Lemma 7, we get the second part of Theorem 1.

Algebraic decision tree complexity
An algebraic decision tree is a type of nonuniform algorithm for problems on inputs composed of n real numbers. For each input size n, it consists of a binary tree whose internal nodes are labeled with inequalities of the form "q(x) ≤ 0" on the input x ∈ R n , where q is a bounded-degree n-variate polynomial in x 1 , x 2 , . . . , x n . Inequalities are interpreted as queries on the input, and the two subtrees correspond to the possible outcomes of the query on the input. Leaves of the tree are labeled with the answer to the problem. The minimum height h(n) of an algebraic decision tree solving instances of size n the problem is the decision tree complexity, or query complexity of the problem. When the queries only involve linear functions, such trees are called linear decision trees. In that case, a query is said to be t-sparse when it involves at most t numbers of the input. We have the following recent result on the linear decision tree complexity of the k-SUM problem. We now show that this result directly applies to the SIMILARITY MATCHING and AFFINE MATCHING problems, thereby proving Corollary 4.
We first consider the SIMILARITY MATCHING problem, an instance y of which consists of two coordinates per point of P and S, hence of 2(k + n) real numbers. Suppose we apply the randomized reduction proposed in Theorem 12 to obtain an instance of k-SUM(R). Now consider the linear decision tree from Theorem 16. Each linear query on the transformed input maps to a query on the original input numbers y. Because the reduction only involves multiplications and additions on these numbers, such queries are algebraic queries on the original input y. Therefore, the linear decision tree for k-SUM maps to an algebraic decision tree of the same height for SIMILARITY MATCHING. The same reasoning applies to AFFINE MATCHING. In that case, it suffices to observe that multiplying both sides of every query by the quantity det Q for the matrix Q used in the proof of Lemma 14 yields algebraic queries again. Note that since k and d are constant and the linear queries in Theorem 16 are sparse, the queries have bounded degree and bounded size. This proves Corollary 4.
Also note that if we suppose the pattern P is a fixed parameter of the problem, then the two problems are solved by linear decision trees of height O(n log 2 n). It can indeed be checked that the algebraic queries do not involve multiplications between coordinates of the points of S, hence are linear whenever P is fixed. This proves Corollary 5. It applies in particular to the PARALLELOGRAM problem, or for finding an equilateral triangle in a point set.