Support Vector Machines in the Hilbert Geometry
Abstract
Support Vector Machines (SVMs) are a class of classification models in machine learning that are based on computing a maximum-margin separator between two sets of points. The SVM problem has been heavily studied for Euclidean geometry and for a number of kernels. In this paper, we consider the linear SVM problem in the Hilbert metric, a non-Euclidean geometry defined over a convex body. We present efficient algorithms for computing the SVM classifier for a set of points in the Hilbert metric defined by convex polygons in the plane and convex polytopes in -dimensional space. We also consider the problems in the related Funk distance.
Keywords and phrases:
Support vector machines, Hilbert geometry, linear classification, machine learning, LP-type problemsFunding:
Sunil Arya: Research Grants Council of Hong Kong, China project number 16214721.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Computational geometryEditors:
Pat Morin and Eunjin OhSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Support vector machines (SVMs) were introduced by Vapnik and Chervonenkis [34] as a type of classification algorithm [12]. They take two classes of data that are embedded as points in a multidimensional feature space. Once embedded, the algorithm attempts to separate them using a kernel function, which is chosen to maximize the separation distance between the two classes. Throughout, we assume linear kernels, which means that points are separated by a hyperplane.
Two sets of points are said to be linearly separable if there exists a -dimensional hyperplane such that and lie on opposite sides of , possibly on itself (see Figure 1(a)). Define to be the set of separating hyperplanes. The minimum distance from either set to a separating hyperplane is called its margin. Given two linearly separable point sets and in , the (linear) SVM problem is that of computing the hyperplane that maximizes the margin (see Figure 1(b)). The name arises from the fact that, assuming general position, there are points that achieve the margin exactly, called the support vectors.
SVMs are a classic tool in machine learning [13, 31, 28, 5]. They are highly regarded as a classification method due to their geometric interoperability, robustness against over-fitting, practicality with small and high dimensional datasets, and success in generalizing to unseen data. SVMs are widely used in the fields of vision [32, 23], natural language processing [15, 22], and computational biology [21, 4]. In particular, SVMs have seen a lot of use in image [8, 16] and text classification problems [33].
SVMs are well studied in standard Euclidean geometry and other common geometries, such as hyperbolic geometry [10]. They have also been studied relative to a variety of different kernels [20]. In this paper, we consider the SVM problem in a different geometry, called the Hilbert geometry. In 1895, David Hilbert introduced the Hilbert metric [19]. It is defined on a convex body . The Hilbert metric assigns a distance between two points by considering their relative positions along the chord of that passes through them. (Definitions will be provided in Section 2.) The Hilbert metric is widely used in science and engineering. It has many desirable properties, including the facts that it is invariant under projective transformations, straight lines are geodesics, and it generalizes a common model of hyperbolic geometry. An excellent resource on Hilbert geometries is the handbook by Papadopoulos and Troyanov [27]. This is the first work to study SVMs in the Hilbert geometry.111Note that SVMs have been studied in the context of reproducing kernel Hilbert spaces [37]. Despite the similarity of the names, Hilbert spaces and Hilbert geometries are entirely different.
Given the emerging interest in the Hilbert geometry, it is natural to ask whether the classical SVM problem can be efficiently solved in this setting. The input consists of two discrete, linearly separable point sets and in the interior of a convex body , and the objective is to compute a separating hyperplane that maximizes the minimum Hilbert distance from every point of to . Changing how distances are measured results in very different criteria for the optimum separating hyperplane (see Figure 1(c)). In this paper, we explore these differences and develop an efficient algorithm for the Hilbert SVM problem for convex polygons in the plane. We also generalize our results to the closely related Funk geometry (defined in Section 2.1). The following theorems summarize our main results. For our multidimensional results, is a full-dimensional convex polytope of total combinatorial complexity . By this, we mean that is the convex hull of a finite set of points in general position, and is the total number of faces of all dimensions on its boundary.
Theorem 1.
-
(i)
The Hilbert SVM problem for a set of points in an -sided convex polygon in is of LP-type and can be solved in time.
-
(ii)
For any fixed , the Hilbert SVM problem for a set of points in a convex polytope in is of LP-type and can be solved in time, where is the total combinatorial complexity of .
Theorem 2.
-
(i)
The Funk SVM problem for a set of points in an -sided convex polygon in is of LP-type and can be solved in time.
-
(ii)
For any fixed , the Funk SVM problem for a set of points in a convex polytope in is of LP-type and can be solved in time, where is the number of vertices of .
Note that it is possible to test whether and are linearly separable through a standard reduction to linear programming. Through an analysis of the geometry of separating hyperplanes in the Funk and Hilbert geometries, we show that in , these SVM problems are of LP-type [30] of combinatorial dimension . By standard results, in any fixed dimension , LP-type problems can be solved with violation tests and evaluations of the objective function on point sets of constant size. (See, e.g., Sharir and Welzl [30], Seidel [29], and Clarkson [11], or the deterministic algorithm of Chazelle and Matoušek [9].) Our main technical contribution is a detailed analysis of the geometric structure of the distance between points and lines and hyperplanes in the Funk and Hilbert geometries. We use this analysis to show how to compute the violation tests and objective function evaluations that are required by algorithms for solving LP-type problems.
1.1 Applications of the Hilbert Geometry
There are a number of applications of the Hilbert geometry in science and engineering. An example involves discrete probability distributions. A discrete probability distribution can be modeled as a point in the probability simplex, . Clearly, this body is equivalent to a simplex in , and hence the -dimensional simplicial Hilbert distance provides a natural distance measure between two probability distributions. Nielsen and Sun have demonstrated the benefits of using the Hilbert metric over other standard distance functions for probability distributions [25]. They considered clustering tasks in the probability simplex and the correlation elliptope. They observed that the Hilbert metric performed competitively when compared to other distance models in the probability simplex, such as the Fischer-Hotelling-Rao metric distance and the Kullback-Liebler divergence. They also showed that the Hilbert metric on the probability simplex satisfies the important property of being information monotone [25].
Beyond machine learning, the Hilbert metric has found recent uses in the context of convex approximation [2]. This is due to its close relationship with Macbeath regions [35, 1], which, up to scaling factors, are equivalent to Hilbert balls. This has been used to analyze the flag approximability of convex bodies [36] and the construction of efficient convex coverings [3]. Various results from classical computational geometry have been recreated in the Hilbert metric, including algorithms for Voronoi diagrams [18, 18, 6] and Delaunay triangulations [17].
The remainder of the paper is organized as follows. In Section 2, we present preliminary definitions and facts about the Hilbert geometry and SVMs. In Section 3, we present the proof that the Hilbert SVM problem is of LP-type of combinatorial dimension three. In Section 4, we present algorithms for computing the primitives of violation testing and computing the objective function. Finally, in Section 5, we present concluding remarks.
2 Mathematical Preliminaries
In this section, we provide a number of definitions and useful facts. Throughout, for our planar results, is an -sided convex polygon. For our multidimensional results, is a full-dimensional convex polytope of total combinatorial complexity . By this, we mean that is the convex hull of a finite set of points in general position, and is the total number of faces of all dimensions on its boundary. Let denote its boundary, and let denote its interior. Given two points , let denote the chord of passing through these points, that is, the intersection of with the line through and . Given a set , let denote its convex hull. Given two points and , let denote the line that passes through them, that is, their affine closure.
2.1 The Funk and Hilbert Distances
In this section, we will introduce the Funk, reverse-Funk, and Hilbert geometries over . For any two points , if all the distances are defined to be zero. Otherwise, let and denote the endpoints of the chord ordered as (see Figure 2(a)). These distances are defined in terms of the ratios of distances between pairs of these points.
| (1) | |||||
| (2) | |||||
| (3) | |||||
Intuitively, is a measure of how far is from relative to the boundary that lies beyond , and is just the reverse of this. The Hilbert distance symmetrizes these by taking their arithmetic mean.
These distance functions are all nonnegative and satisfy the triangle inequality [27]. The Hilbert distance defines a metric over . The Funk and reverse Funk are asymmetric, and so they each define a weak metric over . Observe that as either point approaches the boundary of , the Hilbert distance approaches . (This is true for one of the two points in Funk and reverse-Funk.) The Funk and reverse-Funk distances are invariant under invertible affine transformations. The Hilbert distance has the additional property that it is invariant under invertible projective transformations.
Letting denote any of the above and given a point and scalar , the associated metric balls are defined in the standard way.
In particular, letting and , it is easy to verify that the Funk and reverse-Funk balls are
| (4) | ||||
| (5) |
Thus, the Funk ball is a scaling of by a factor of about [26] (see Figure 2(b)), and the reverse-Funk ball is the intersection of with a scaling of by a factor of about (see Figure 2(c)). Unlike the Funk ball, this scaling may generally extend outside .
The Hilbert ball about a point is a bit more complicated to describe. Let us first consider the planar case. Given a point , define a spoke to be the chord , for some vertex of , and define a half-spoke to be either of the subsegments of the spoke whose endpoint is . The vertices of define spokes, which induce half-spokes joining to the boundary of (see Figure 2(d)). Nielsen and Shao showed that is a convex polygon whose vertices are the points lying on these half-spokes at Hilbert distance from . Furthermore, the edges of the ball satisfy the following concurrency condition [24].
Lemma 3.
Consider a convex polygon in and an edge on the boundary of some Hilbert ball . Let and denote the edges of that lie within the double-wedge formed by the two half-spokes that bound (see Figure 3). Then the linear extensions of , , and are concurrent.
This lemma generalizes to , where , and are facets, but the planar version suffices for our results. Since all distances will be defined with respect to henceforth, we will frequently omit explicit references to when discussing distances and metric balls.
The concept of spokes can be easily generalized to the case where is a convex polytope in , and . In particular, for , consider two faces and of of respective dimensions and such that lies in the convex hull of . It follows from straightforward conditions on dimensionality (see, e.g., [14]) that, assuming general position, for any two such faces, there is at most one line passing through that intersects both faces. Define a spoke of to be the chord of contained in any such line, and again a half-spoke is the portion of the chord lying on one side of . (The planar case arises when and .)
Lemma 4.
Consider a convex polytope in of combinatorial complexity and a point , for any , the Hilbert ball is a polytope with vertices, each of which lies on a half-spoke emanating from . When , this reduces to .
Proof.
The planar case follows from the analysis of Nielsen and Shao [24]. For the higher-dimensional cases, we appeal to a variational argument. Consider any vertex of . Suppose to the contrary that the chord is not a spoke. Consider the points and where this chord intersects . It follows that there are nonempty affine neighborhoods around and , denoted and , respectively, , such that lines passing through and intersecting also intersect . This implies that there is a nonempty affine neighborhood around such that , for all in this neighborhood. Hence, is not a vertex of . By the observations made above, the number of spokes passing through any point is bounded by the square of the number of faces of , that is, .
2.2 On Hilbert and Funk Balls
In this section, we present a number of technical results regarding the geometry and construction of Hilbert and Funk balls for lines and hyperplanes. Due to space limitations, some details are omitted and appear in the full version of the paper.
Let us consider how to compute the reverse-Funk ball of radius for a hyperplane that intersects a polytope . (Later, we will consider how this applies to the planar case, where is a line.) Consider any point that lies on the boundary of (see Figure 4(a)). This means that the closest point of on lies at Funk distance . Letting denote this point, we have . Consider the supporting hyperplane of that is parallel to and on the opposite side from . Let be a vertex of on (see Figure 4(b)). By convexity, the segment intersects at some point within . Define to be this point. By Eq. (4) (noting that here plays the role of in the equation), we have . It follows that the set of points that lie on the boundary of define a hyperplane that is parallel to , where the relative distances between these hyperplanes is determined by . By applying the same logic to the points on the opposite side of , it follows that is the intersection of with a slab bounded by two hyperplanes that are parallel to .
The times to compute and are dominated by the time needed to compute the supporting hyperplanes of that are parallel to . If is an -sided convex polygon in , this can be done in time through binary search. If is a convex polytope in with vertices, this can be done in time.
Lemma 5.
Consider a convex polytope in and a -dimensional hyperplane that intersects the interior of and . Then
-
(i)
is the intersection of with a slab that is parallel to .
-
(ii)
is the point on that minimizes the Funk distance .
If is an -sided convex polygon in , and can be computed in time. If is a convex polytope in with vertices, they can be computed in time.
Next, we consider the Hilbert ball of radius about a line that intersects the interior of , . The procedure for constructing such a ball was given by Gezalyan and Mount [18, Lemma 6], but we sketch it here for the sake of completeness.
First, extend each of the edges of until the extension intersects . For each such point , consider the triangle defined by the convex hull of the edge that was extended to form and the vertex where the supporting line for passing through hits the opposite side of (see Figure 5(a)). These triangles subdivide . Each such triangle has two chords that cross . On each such chord, create two points at Hilbert distance from the crossing point with . These points are in convex position, and by the projective properties of the Hilbert distance, the line segments connecting them through each triangle are part of the Hilbert ball. Together, these points constitute the vertices of the convex hull of the Hilbert ball of the line (see Figure 5(b)).
Given a point , we can compute its closest point on as follows. Determine the triangle of the above construction that contains (see Figure 5(c)). Consider the chord passing through and the appropriate vertex of the triangle. Let denote the intersection point of this chord with . The chord that joins and is called the closest-point spoke for with respect to .
Lemma 6.
Consider a convex polygon in and a line that intersects the interior of and . Then
-
(i)
is generated by the above procedure
-
(ii)
is the point on that minimizes the Hilbert distance .
Assuming has -sides, and can be computed in time.
2.3 Support Vector Machines in the Euclidean Geometry
Before discussing SVMs in the Hilbert setting, it is illustrative to explore the differences with the Euclidean case. Given two point sets (see Figure 6(a)) which are assumed to be linearly separable, the objective is to compute a separating hyperplane that maximizes the margin, defined to be the minimum distance of either point set to (see Figure 6(b)). The points achieving the margin are called the support vectors. Local minimality considerations imply that, assuming general position, the number of support vectors does not exceed [7].
Define a slab to be the region bounded between two parallel hyperplanes, the Euclidean SVM problem can be interpreted as the problem of computing the slab of maximum orthogonal width that separates and . The SVM hyperplane lies midway between the slab’s bounding hyperplanes. The margin is just half the slab’s width (see Figure 6(b)). An alternative and equivalent interpretation is to consider the maximum radius such that by placing balls of radius about the points of and , the resulting sets of balls are linearly separable (see Figure 6(c)). By replacing Euclidean balls with metric balls, this observation can be readily generalized to any normed space.
3 SVMs in the Funk and Hilbert Geometries
In this section, we present our algorithms for solving the SVM problem in the Funk and Hilbert geometries. We consider both the planar and multidimensional cases. Recall that is a full-dimensional convex polytope of total combinatorial complexity . In both cases, we are given two linearly separable point sets (see Figure 7(a)). The objective is to compute a separating hyperplane that maximizes the minimum Hilbert (or Funk) distance from each point of to . Throughout, let . Although we treat both and as asymptotic quantities, we think of more as a large constant, since it defines a fixed geometry.
Just as in the Euclidean case (see Section 2.3), there are two equivalent formulations of the problem. In the first, we seek the separating hyperplane of maximum radius such that contains no points of in its interior (see Figure 7(b)). The other is to determine the maximum radius such that the union of balls is linearly separable from (see Figure 7(c)). As in the Euclidean SVM problem, the support vectors are associated with the points of and whose distance to the SVM separator equals the margin. We augment the concept of support vector to include not only the point but the half-spoke that identifies the vertex of the Hilbert or Funk ball that lies on the separating hyperplane. Note that when dealing with the Funk distance, we use the forward sense of the distance when computing balls centered at the points, and we use the reverse-Funk when computing the ball centered about the hyperplane.
3.1 Local Optimality
In , we refer to the hyperplane that achieves the maximum Hilbert margin as the Hilbert SVM hyperplane for and , and we define the Funk SVM hyperplane analogously for the Funk distance. The following lemma provides a characterization of this hyperplane.
Lemma 7.
Given a convex polytope in and two linearly separable point sets in general position, the Hilbert (resp., Funk) SVM hyperplane satisfies the following condition. Let denote the margin achieved by . There exist pairs , each consisting of a point and a half-spoke of . For , let denote the unique point at Hilbert (resp., Funk) distance from along . The points all lie on , and furthermore, if we partition as and , then .
Proof.
For any , let denote the set of points of whose distance from any point is at most , and define similarly for . As mentioned above, SVM is equivalent to determining the maximum such that and are linearly separable. As the value of increases towards the optimum margin, the set of hyperplanes that separate these sets of balls becomes progressively smaller, until there is a unique separating hyperplane. Because Hilbert balls are convex polytopes, when this critical event occurs, the separating hyperplane is constrained by contact points, which are generally vertices of these balls (see Figure 8(a) and (b)).
By Lemma 4, these vertices lie on the half-spokes of points from and and each vertex is at distance from its respective point. Let denote these point-spoke combinations, and let and be as defined in the statement of the lemma. If (see Figure 8(c)), then there exists a -dimensional affine subspace that separates them. By rotating about , we can break the contact with these metric balls, implying that it would be possible to increase the margin, a contradiction.
3.2 LP-Type Problems
In this section, we show that our Hilbert and Funk SVM problems possess properties similar to that of low-dimensional linear programming, which leads to an efficient algorithmic solution. In particular, we show that these problems have a property called LP-type [30]. This type of problem requires a finite set as input and an objective function to be maximized, which induces a total ordering over subsets of . Generally, an LP-type problem is defined as follows.
Definition 8 (LP-type [30]).
An optimization problem over a set and objective is of LP-type if:
-
Monotonicity: For sets , .
-
Locality: For sets and every , if , then .
A basis of an LP-type problem is a set with the property that every proper subset of has a larger objective-function value than itself. The combinatorial dimension of an LP-type problem is defined to be the maximum cardinality of a basis.
A natural choice for would be and a basis would consist of points of this set. While this suffices for the simpler Funk distance and the Hilbert distance in the plane, we will need more information for the multidimensional Hilbert SVM problem. In particular, we will use pairs consisting of a point of and a half-spoke for this point. As seen in Lemma 7, these entities locally define the SVM hyperplane. Given , , and , define to be the set of pairs , where , and is any half-spoke emanating from . Such a pair is called a point-spoke. In the multidimensional setting, the role of the set in the definition of LP-type will be played by .
We define the objective function for the multidimensional Hilbert SVM problem as follows. Recall that denotes the set of -dimensional hyperplanes that separate and . Given a hyperplane and a point-spoke , define to be the Hilbert distance from to measured along the half-spoke . If does not intersect , define the distance to be . In the planar case, given any set , we define to be
In the multidimensional case, given any and , define
Given , define to be the set of half-spokes of . By convexity of Hilbert balls, for any the half-spoke that defines the distance to any hyperplane is the one that minimizes the value of . Thus, the objective function for our LP-type problem is the following.
Our next result shows that Hilbert SVM satisfies these conditions.
Lemma 9.
Given a convex polytope in and a set of linearly separable points , the Hilbert SVM problem and the Funk SVM problems are both LP-type problems of combinatorial dimension .
Proof.
We proved in Lemma 7 that for either distance measure, any optimal solution is determined by a basis consisting of point-spoke pairs, which implies that the combinatorial dimension is . Monotonicity holds trivially by the fact that adding more point-spokes to a set can never increase the minimum distance from that set to any hyperplane.
To establish locality, suppose that for some sets, , we have . We assert that, the maximum-margin separating hyperplane is unique. To see why, suppose to the contrary that and define two distinct separating hyperplanes, and , respectively, both with equal margins. These hyperplanes intersect in a -affine subspace (possibly external to ). Every strict convex combination of these two hyperplanes yields a separating hyperplane that passes through . Assuming general position (and particularly that no point-spokes are equidistant to in terms of the Hilbert/Funk distance), it follows that does not contact all basis point-spokes. Since point-spokes are needed to fix a hyperplane, there exists a perturbation of that yields a strictly larger margin, a contradiction.
Therefore, it follows that the support vectors that form this basis are all members of . We may assume that , since otherwise the conclusion holds trivially. Since , the metric ball centered at whose radius is equal to the margin does not intersect the separating hyperplane. Therefore, adding to does not change the margin value, which implies that , as desired.
4 Solving the LP-Type Problems
As mentioned above, in any fixed-dimensional space, an LP-type problem over a ground set of size can be solved in time assuming free access to two primitives. The first involves evaluating the objective function for a given basis set. Second, the violation test checks whether a given point causes the objective function to decrease. In this section, we will explore the running times of these primitives.
Given point sets and of total size , recall that the ground set for the Funk SVM (in all dimensions) and the planar Hilbert SVM problems consists of . Thus, . For the multidimensional Hilbert SVM, the ground set consists of all point-spoke pairs . Assuming that has total combinatorial complexity , . The overall running time to solve the SVM problems will be the product of the size of the ground set and the running time of the two primitives.
Our algorithms for computing the objective function will return both the SVM hyperplane and the margin as a witness. This implies that the violation tests are relatively easy. In the case of the Funk distance and the two-dimensional Hilbert distance, we are given a point to test. In the Funk case, by Lemma 5, we can compute the distance from to in time in and in time in the multidimensional case. In the two-dimensional Hilbert case, by Lemma 6, we can compute the distance from to in time. In all cases, there is a violation if and only if this distance exceeds .
The remaining case is the multidimensional Hilbert geometry. In this case, we are given a point-spoke pair to test. In time, we can determine the facets hit by the endpoints of the associated chord. We can then determine the point of intersection of this chord with the hyperplane and compute the distance from to the hyperplane along this spoke in constant time. Again, a violation occurs if this distance exceeds . All of these tests can be performed within the time bounds needed for the objective function evaluation, so they do not dominate the overall running time.
4.1 The Funk Objective Function
We will first discuss the objective function for the Funk SVM, since it is the easier of the two. In the planar case, Lemma 7 implies that a basis consists of three points, say . From the lemma, we know that two points must come from one set and one from the other, so there is no loss in generality in assuming that and . Let be the SVM line we seek. Consider the supporting line of that lies on the opposite side of as and . Let denote the vertex of incident to this supporting line. By Lemma 5(ii), and lie on the segments and , respectively (see Figure 9(a)). Since , it follows that is parallel to .
Given the two points and , we can compute the vertex in time by binary search on the boundary of . We can compute the parallel supporting line on the opposite side of in time, which is used to compute ’s Funk distance. From these two lines, it is straightforward to compute the location of the parallel line that equalizes the Funk distances among all three points.
By combining this with the observations from Section 4, we can compute both the violation test and the objective function for the planar Funk SVM in time . Given that the ground set has points, the overall running time is , and Theorem 2(i) follows.
Next, let us consider the -dimensional case, for any fixed . A basis consists of points . By Lemma 5(i), the points lie on two sides of a slab that is parallel to the desired SVM hyperplane. Let denote this hyperplane. To compute , observe that it can be represented as , for some nonzero vector . Thus, there exist reals and such that the points above the slab satisfy and the points below the slab satisfy . This yields a system of homogeneous linear equations in the variables . We can solve these equations to determine up to a nonzero scale factor.
This fixes the orientation of , but not its exact location. To compute its location, we first determine the two supporting hyperplanes of parallel to . This can be done in time, where denotes the number of vertices of . Given these hyperplanes, we take any basis point of and any basis point of , and through a straightforward calculation, in time we can determine the parallel hyperplane that equalizes the Funk distance to each.
In summary, assuming that the dimension is fixed, both the violation test and the objective function can be computed in time. Given that the ground set has points, the overall running time is . This establishes Theorem 2(ii).
4.2 The Multidimensional Hilbert Objective Function
Next, let us consider how to compute the objective function for the -dimensional case, for any fixed . A basis consists of point-spoke pairs . We claim that we can compute in time. To see this, observe that can be represented as for some nonzero vector . For a given point-spoke pair , in time we can compute the points and , where the extension of the half-spoke intersects . Let us assume this is oriented so that is in the direction of (see Figure 9(b)). For a given distance , let denote the point along this spoke that is at Hilbert distance from . To simplify the calculation, let , and let us assume that the chord is affinely mapped to the interval on the real line, which maps and to points in this interval (which we will continue to refer to as and ). By Eq. (3), for
Together with the constraints (from the fact that ) and (from the requirement that the ’s lie in the interval ), this yields a system of algebraic equations of degree in the unknowns . (Note that the values are fixed.) Since is a fixed constant, we can solve this system in time. If this system has a feasible solution, then the objective function value is , and the values of the ’s determine the hyperplane by specifying the points where the separating hyperplane crosses each of the half-spokes.
The time to compute the objective function is dominated by the time needed to compute the endpoints of each of the half-spoke extensions. By Lemma 4, each point gives rise to half-spokes, which implies that the ground set is of size . Thus, the overall running time is , and this establishes Theorem 1(ii).
4.3 The Hilbert Objective Function in the Plane
Finally, let us consider computing the objective function for the planar Hilbert SVM. From Lemma 7, a basis consists of three support vectors, which must include at least one point from each of and . As before, if we are presented with a subset that does not meet these conditions, we report a value of for the objective function. We will focus on the more interesting case of computing the Hilbert SVM objective function for a basis consisting of three points . There is no loss in generality in assuming that and . As observed earlier, the value of the objective function is the maximum such that the balls and are linearly separable from . The resulting separating line is equidistant to all three points (see Figure 8(a)). Note that there need not be a separating line that is equidistant to three points (see Figure 8(c)). In this section, we will present an time algorithm that either finds the desired equidistant separating line, or reports that no such line exists. Note that if the line does not exist, the value of the objective function reduces to a two-point case, either or , whichever achieves the smaller separation distance.
4.3.1 Overview
Let us begin with an overview of the process. For the sake of illustration, we may assume that the points are given in left-to-right order (increasing -coordinates) as , and lies vertically below a point on the line segment . This implies that there is no vertical separating line, and for any separating line and will be above it and below.
By the observations made before Lemma 6, we know that for any and any line , the distance from to is determined by its closest point , which lies on the closest-point spoke of with respect to (recall Figure 5(c)). Let denote the equidistant separating line (assuming that it exists), and let , and define and analogously. It suffices to compute , , and , or, failing this, to report that there is no equidistant separating line for these points.
Our approach will be based on a coordinated binary search. Let us suppose for now that an equidistant line exists, and let denote the Hilbert distance of each point from this line. For any radius , where , let denote the unique line that is cross-tangent to the balls and , with above and below (see Figure 10(a)). Clearly, this line varies continuously as a function of . Observe that lies above , and ’s ball lies above the line until reaches the value , after which this condition fails. This suggests an approach based on a parametric search based on the radius .
To do this effectively, we need to establish a small set of candidate radii that contains the optimal radius . We will organize the search around the contact points where the cross-tangent line intersects the boundary of the metric balls. We will show that these boundary points vary monotonically in cyclic order around the center point as the radius increases. Since we know that there is no vertical separating line, we may restrict consideration to the lower half of ’s and ’s metric balls (that is, the lower portion of the boundary between the leftmost and rightmost points) and the upper portion of ’s metric ball.
Recall that for any radius, each vertex of a Hilbert ball lies on one of the half-spokes passing through the center point and each of the vertices of . Given a point , a vertex of any Hilbert ball centered at is uniquely identified by the half-spoke emanating from and the ball’s radius . We refer to this half-spoke as a virtual vertex. The actual vertex is the point along this half-spoke at distance from . Similarly, an edge of ’s metric ball, is uniquely identified by the cone with apex at between each pair of consecutive half-spokes. This cone defines a virtual edge and, for a given radius , the actual edge is determined by the line segment between the associated actual vertices.
For any point, there are virtual vertices and virtual edges. Since these entities are associated with elements on the boundary of , it is possible to index these entities after preprocessing time to split the spokes about the center point into half-spokes and merge them in cyclic order about the center point. Our algorithm involves a cyclic search among the virtual edges and/or the virtual vertices about each metric ball, which is performed by a coordinated binary search. The numerous technical details are presented in Section A.
5 Conclusions
In this paper, we presented an efficient algorithm for solving the SVM problem in the Hilbert and Funk geometries for a set of points in convex polygons and polytopes of combinatorial complexity . Our results rely on the fact that the Hilbert SVM problem has LP-type structure.
Our planar results are quite efficient, but our multidimensional results are polynomial in the total combinatorial complexity of the polytope , which generally scales exponentially with the dimension. A natural question is whether the dependency can be reduced, say, to a polynomial in the combinatorial complexity of , measured either as the number of facets or the number of vertices. One challenge to this is the issue that the number of vertices on the Hilbert ball grows quadratically in the number of faces of all dimensions of the polytope. Worse yet, the combinatorial structure of the Hilbert ball varies with the location of the center point.
These challenges suggest that the problem may be hard to solve in the case of general polytopes, but there are special cases of the multidimensional Hilbert geometry where the problem is still interesting. For example, Nielsen and Sun have demonstrated the usefulness of the Hilbert metric in applications of machine learning and clustering [25] for the multidimensional simplicial Hilbert geometry. Yet another direction for future work would be to consider kernel shapes other than hyperplanes.
Finally, it would be interesting to know whether our results can be extended to the reverse-Funk distance. Unlike the Funk case, the reverse-Funk ball about a hyperplane is not a slab. It seems reasonable to conjecture that the methods that were used for the more complicated Hilbert geometry could be applied to the case of the reverse-Funk SVM.
References
- [1] A. Abdelkader and D. M. Mount. Economical Delone sets for approximating convex bodies. In Proc. 16th Scand. Workshop Algorithm Theory, pages 4:1–4:12, 2018. doi:10.4230/LIPIcs.SWAT.2018.4.
- [2] A. Abdelkader and D. M. Mount. Convex approximation and the Hilbert geometry. In Proc. SIAM Symp. Simplicity in Algorithms (SOSA24), pages 286–298, 2024. doi:10.1137/1.9781611977936.26.
- [3] S. Arya, G. D. da Fonseca, and D. M. Mount. Economical convex coverings and applications. In Proc. 34th Annu. ACM-SIAM Sympos. Discrete Algorithms, pages 1834–1861, 2023. doi:10.1137/1.9781611977554.ch70.
- [4] A. Ben-Hur, C. S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch. Support vector machines and kernels for computational biology. PLoS Comput. Biology, 4(10):e1000173, 2008. doi:10.1371/journal.pcbi.1000173.
- [5] C. M. Bishop. Pattern Recogn. and Mach. Learn. Springer, 2016. doi:10.1007/978-0-387-45528-0.
- [6] M. Bumpus, C. Dai, A. H. Gezalyan, S. Munoz, R. Santhoshkumar, S. Ye, and D. M. Mount. Software and analysis for dynamic Voronoi diagrams in the Hilbert metric, 2023. arXiv:2304.02745.
- [7] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121–167, 1998. doi:10.1023/A:1009715923555.
- [8] O. Chapelle, P. Haffner, and V. N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. Neural Netw., 10:1055–1064, 1999. doi:10.1109/72.788646.
- [9] B. Chazelle and J. Matoušek. On linear-time deterministic algorithms for optimization problems in fixed dimension. Journal of Algorithms, 21:579–597, 1996. doi:10.1006/jagm.1996.0060.
- [10] H. Cho, B. DeMeo, J. Peng, and B. Berger. Large-margin classification in hyperbolic space. In Proc. 22nd Internat. Conf. Artif. Intell. and Stat., pages 1832–1840. PMLR, 2019. doi:10.48550/arXiv.1806.00437.
- [11] K. L. Clarkson. Las Vegas algorithms for linear and integer programming when the dimension is small. J. Assoc. Comput. Mach., 42:488–499, 1995. doi:10.1145/201019.201036.
- [12] C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20:273–297, 1995. doi:10.1007/BF00994018.
- [13] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.
- [14] S. Das, S. R. Dev, and Sarvottamananda. A worst-case optimal algorithm to compute the Minkowski sum of convex polytopes. Discrete Appl. Math., 350:44–61, 2024. doi:10.1016/j.dam.2024.02.004.
- [15] D. DeCoste and B. Schölkopf. Training invariant support vector machines. Mach. Learn., 46:161–190, 2002. doi:10.1023/A:1012454411458.
- [16] G. M. Foody and A. Mathur. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Remote Sensing, 42:1335–1343, 2004. doi:10.1109/TGRS.2004.827257.
- [17] A. H. Gezalyan, S. Kim, C. Lopez, D. Skora, Z. Stefankovic, and D. M. Mount. Delaunay triangulations in the Hilbert metric. In Proc. 19th Scand. Workshop Algorithm Theory, pages 25:1–25:17, 2024. doi:10.4230/LIPIcs.SWAT.2024.25.
- [18] A. H. Gezalyan and D. M. Mount. Voronoi diagrams in the Hilbert metric. In Proc. 39th Internat. Sympos. Comput. Geom., pages 35:1–35:16, 2023. doi:10.4230/LIPIcs.SoCG.2023.35.
- [19] D. Hilbert. Ueber die gerade Linie als kürzeste Verbindung zweier Punkte. Math. Annalen, 46:91–96, 1895. doi:10.1007/BF02096204.
- [20] T. Hofmann, B. Schölkopf, and A. J. Smola. Kernel methods in machine learning. The Annals of Statistics, 36:1171–1220, 2008. doi:10.1214/009053607000000677.
- [21] S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics & Proteomics, 15:41–51, 2018. doi:10.21873/cgp.20063.
- [22] T. Joachims. Learning to Classify Text Using Support Vector Machines. Springer, 2002. doi:10.1007/978-1-4615-0907-3.
- [23] S. Krig. Computer Vision Metrics: Survey, Taxonomy, and Analysis. Springer nature, 2014.
- [24] F. Nielsen and L. Shao. On balls in a Hilbert polygonal geometry (multimedia contribution). In Proc. 33rd Internat. Sympos. Comput. Geom., pages 67:1–67:4, 2017. doi:10.4230/LIPIcs.SoCG.2017.67.
- [25] F. Nielsen and K. Sun. Clustering in Hilbert’s projective geometry: The case studies of the probability simplex and the elliptope of correlation matrices. In Frank Nielsen, editor, Geometric Structures of Information, pages 297–331. Springer Internat. Pub., 2019. doi:10.1007/978-3-030-02520-5_11.
- [26] A. Papadopoulos and M. Troyanov. From Funk to Hilbert geometry. In Handbook of Hilbert geometry, volume 22 of IRMA Lectures in Mathematics and Theoretical Physics, pages 33–68. European Mathematical Society Publishing House, 2014. doi:10.4171/147-1/2.
- [27] A. Papadopoulos and M. Troyanov. Handbook of Hilbert geometry, volume 22 of IRMA Lectures in Mathematics and Theoretical Physics. European Mathematical Society Publishing House, 2014. doi:10.4171/147.
- [28] S. J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Pearson, 4th edition, 2021.
- [29] R. Seidel. Low dimensional linear programming and convex hulls made easy. Discrete Comput. Geom., 6:423–434, 1991. doi:10.1007/BF02574699.
- [30] M. Sharir and E. Welzl. A combinatorial bound for linear programming and related problems. In Proc. 9th Internat. Sympos. on Theoret. Aspects of Comp. Sci., pages 567–579, 1992. doi:10.1007/3-540-55210-3_213.
- [31] I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008. doi:10.1007/978-0-387-77242-4.
- [32] R. Szeliski. Computer Vision: Algorithms and Applications. Springer, 2022. doi:10.1007/978-3-030-34372-9.
- [33] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res., 2:45–66, 2001. doi:10.1162/153244302760185243.
- [34] V. N. Vapnik and A. Y. Chervonenkis. On a class of algorithms of learning pattern recognition. Automation and Remote Control, 25(6):937–945, 1964. (In Russian). URL: http://mi.mathnet.ru/at11678.
- [35] C. Vernicos. On the Hilbert geometry of convex polytopes. In Handbook of Hilbert geometry, volume 22 of IRMA Lectures in Mathematics and Theoretical Physics, pages 111–126. European Mathematical Society Publishing House, 2014. doi:10.48550/arXiv.1406.0733.
- [36] Constantin Vernicos and Cormac Walsh. Flag-approximability of convex bodies and volume growth of Hilbert geometries, 2018. arXiv:1809.09471.
- [37] H. Wendland. Scattered Data Approximation. Cambridge University Press, 2004. doi:10.1017/CBO9780511617539.
Appendix A The Hilbert Objective Function in the Plane
In this section, we present the technical details of the evaluation of the Hilbert SVM objective function in the planar case, which were deferred from the main body of the paper.
A.1 Incidence Points and Candidate Radii
Define to be the incident virtual vertex or virtual edge of with (recall Figure 10(b)). (Assuming general position, a line will be incident to a vertex only, but we will be interested in those radii where the line is incident to an edge.) Similarly, define to be the incident vertex or edge of with . Symmetrically, define to be the unique line that is cross-tangent to the balls and , with lying above and lying below, and define and analogously. Our approach is to conduct a coordinated binary search among these three sorted lists of incidence entities to determine the particular triple of entities that yield the equidistant separating line.
To limit the search to a discrete set, the candidate radius values will be chosen so that one of the lines or is incident to an edge of one of the three metric balls. For each virtual edge of the metric balls centered at , let denote the set of radius values such that . Define , , and analogously, where comes from the appropriate metric ball (, , and again , respectively). We will show below that each set is either empty or consists of a single element.
Lemma 10.
Given the point configuration defined above and any edge of ’s Hilbert ball, if exists, then it is unique. (The same holds for the other point combinations, , , and , where is chosen from the appropriate ball.)
Proof.
If exists, then by definition there is a radius such that the cross-tangent between and is incident to (see Figure 10(c)). Let be the smallest such radius and suppose, to the contrary, that there exists a larger radius such that . Thus, the two cross-tangents and are both incident on (the virtual edge) .
By Lemma 3, every edge of any Hilbert ball centered at a point of is concurrent with the point where the linear extensions of two boundary edges of meet. The point of concurrency is clearly external to . Let denote this point. Clearly, cannot lie within the portion of either line between the contact points of the lines with either metric ball (which is internal to ). As the radius increases, the line rotates about in such a way that it moves closer to . Since the radius increases, the ball about becomes larger, implying that intersects (see Figure 10(c)). This contradicts the fact that this line is a cross-tangent between the two balls.
Since this only uses properties of cross-tangents of Hilbert balls, the same proof applies to , , and .
The second property that we will need to drive a binary search is monotonicity. As the radius increases, the virtual point of incidence moves monotonically around the boundary of the Hilbert ball (see Figure 11(a)). This would be easy if the metric balls were simple scalings of each other (as occurs, for example, with the Funk distance). In the case of Hilbert, our proof exploits the concurrency property of the edges of the metric ball (recall Lemma 3). As in the previous lemma, we will show this for , but the argument holds analogously to the others.
Lemma 11 (Monotonicity).
Given the point configuration defined above, varies monotonically with about the boundary of ’s metric ball. (The same holds for the other point combinations , , and , where the motion is counterclockwise for combinations involving and clockwise for combinations involving .)
Proof.
As the radius increases, the cross-tangent line changes continuously. Thus, as increases, the object of incidence forms a sequence that alternates between the vertices and edges of the metric ball. If there is any instance where monotonicity fails, some edge must be visited twice in this sequence. However, this would imply that there are two distinct radius values associated with this edge, which contradicts Lemma 10.
A.2 Search Procedure
We now have laid the necessary foundations to describe our search. Intuitively, we will label the edges and vertices as being either “low” or “high”, depending on whether a cross-tangent incident to this entity corresponds to a radius value that is smaller than or larger than the optimum value , respectively. Given an edge of ’s metric ball, if exists and the associated cross-tangent passes below , we say that the edge is low. If the candidate radius exists, but the cross-tangent does not pass below , we say it is high. If the candidate radius does not exist, we say that it is infeasible.
Each vertex can be categorized similarly. Given a vertex of ’s metric ball, if there exists a radius value such that and passes below , we label this vertex as low. If there exists such a radius , and does not pass below , we label it as high. Unlike edges, a single vertex may be labeled as both low and high. Finally, if there is no such radius, we label the vertex infeasible. By our earlier observation that it suffices to limit consideration to the lower half of ’s and ’s metric balls, we can further categorize infeasible points as lying to the left or right of the low-to-high transition point.
As expected, monotonicity implies that if there is an equidistant separating line, the boundary entities of both ’s and ’s metric balls have a single transition point from low to high. (Assuming general position, this will take the form of a vertex that is labeled both low and high.) The following lemma summarizes the observations made in this section.
Lemma 12.
Given the point configuration defined above, if they are not entirely infeasible, the relevant portions of the boundaries of the metric balls for and consist of a single contiguous low portion followed by a contiguous high portion (possibly sharing a common vertex that is both low and high).
In light of this lemma, the first task of the search is to find the virtual vertex of ’s Hilbert ball where the transition from low to high occurs, and then we do the same for ’s Hilbert ball. Let us consider how this is done for , and the algorithm for is symmetrical. Consider any (virtual) edge of ’s Hilbert ball. We compute the radius by means of a binary search along ’s metric ball’s boundary. Lemma 3 implies that there exists a point (external to ) about which the virtual edge rotates as the radius changes. Let denote the actual location of this edge for any given radius (see Figure 12). Consider any virtual vertex of ’s Hilbert ball. Define to be the actual location of the point along this half-spoke whose distance from is .
Because is external to , moves continuously and monotonically along this directed spoke. Therefore, there is at most one value where lies on the linear extension of . If no such exists, then this spoke is infeasible for . Otherwise, fix at the value where and are collinear. In time, we can test whether this common line is a supporting line to that separates the two balls (see Figure 12(a)). If so, then we have successfully found . Otherwise, the line intersects in one of two ways, either entering at or exiting from . Depending on which case applies, monotonicity (Lemma 11) indicates whether we are “low” or “high” with respect to ’s boundary (see Figure 12(b) and (c)). After probes to ’s spokes, either we succeed in finding or we fail. In case of failure, we classify as “infeasible”. If it is infeasible, we further classify it as lying to the left or right of the transition point, based on which side of the line lies. Since ’s metric ball is bounded by sides and each probe can be performed in time, it follows that through binary search, we can compute the values of , , and in time.
Once we have established , we ask whether the ball lies above or below the line through and . This reduces to computing the closest point between and this line, which, by the observations following Lemma 6, can be accomplished in time. Based on this, we classify on ’s metric ball as being low or high. Since each probe of an edge of ’s metric ball takes time, in time, we can determine the transition point between low and high edges of , whose existence is implied by Lemma 12.
In summary, in time, we can determine the low-high transition vertex for ’s metric ball. By repeating the process symmetrically for (and the cross-tangent to ), we can identify the corresponding vertex for ’s metric ball.
To complete the process, we need only identify the virtual vertex associated with ’s metric ball. To do this, we classify the virtual edges of ’s metric ball as left, right, or infeasible. Let be any virtual edge of ’s metric ball. Apply the same procedure described above to determine and . If both return “infeasible” then is labeled as infeasible. If alone returns a feasible value, we declare to be left (see Figure 13). If alone returns a feasible value, we declare to be right. If both return a feasible value, we classify as left or right depending on whether is greater than or less than , respectively. The vertex we desire is the one where the left-to-right transition occurs. The following lemma summarizes the observations made in this section.
Lemma 13.
Given the point configuration defined above, if they are not entirely infeasible, the relevant portions of the metric ball boundary for consist of a single contiguous left portion followed by a contiguous right portion.
Based on this lemma, we proceed to compute the vertex that separates the left and right portions in time, analogous to how we did this for the and cases. When this is done, we have computed the three virtual vertices for , , and that define the closest points for the Hilbert SVM line. Thus, we have eliminated the combinatorial element of the problem. All that remains is to solve the remaining numerical problem. The solution is quite technical, but the computation involves only a constant number of entities (six, to be precise), and so the running time is . The details are presented in the full version of the paper.
In summary, we have shown that we can compute the violation test in time and we can evaluate the objective function in time . Given that both the violation test and the evaluation of the objective function can be performed in time, this completes the proof of Theorem 1.
