Tolerant Testers of Image Properties

We initiate a systematic study of tolerant testers of image properties or, equivalently, algorithms that approximate the distance from a given image to the desired property. Image processing is a particularly compelling area of applications for sublinear-time algorithms and, specifically, property testing. However, for testing algorithms to reach their full potential in image processing, they have to be tolerant, which allows them to be resilient to noise. We design efficient approximation algorithms for the following fundamental questions: What fraction of pixels have to be changed in an image so it becomes a half-plane? A representation of a convex object? A representation of a connected object? More precisely, our algorithms approximate the distance to three basic properties (being a half-plane, convexity, and connectedness) within a small additive error ε, after reading poly(1/ε) pixels, independent of the image size. We also design an efficient agnostic proper PAC learner of convex sets (continuous and discrete) in two dimensions under the uniform distribution. Our algorithms require very simple access to the input: uniform random samples for the half-plane property and convexity, and samples from uniformly random blocks for connectedness. However, the analysis of the algorithms, especially for convexity, requires many geometric and combinatorial insights. For example, in the analysis of the algorithm for convexity, we define a set of reference polygons Pε such that (1) every convex image has a nearby polygon in Pε and (2) one can use dynamic programming to quickly compute the smallest empirical distance to a polygon in Pε.


INTRODUCTION
Image processing is a particularly compelling area of applications for sublinear-time algorithms and, specifically, property testing. Images are huge objects, and our visual system manages to process them very quickly without examining every part of the image. Moreover, many applications in image analysis have to process a large number of images online, looking for an image that satisfies a certain property among images that are generally very far from satisfying it. Or, alternatively, they look for a subimage satisfying a certain property in a large image (e.g., a face in an image where most regions are part of the background). There is a growing number of proposed rejection-based algorithms that employ a quick test that is likely to reject a large number of unsuitable images (see, e.g., citations in Reference [27]).
Property testing [21,37] is a formal study of fast algorithms that accept objects with a given property and reject objects that are far. Testing image properties in this framework was first considered in Reference [32]. Ron and Tsur [36] initiated property testing of images with a different input representation, suitable for testing properties of sparse images. Since these models were proposed, several sublinear-time algorithms for visual properties were implemented and used: namely, those by Kleiner et al. and Korman et al. [27][28][29].
However, for sublinear-time algorithms to reach their full potential in image processing, they have to be resilient to noise: Images are often noisy, and it is undesirable to reject images that differ only on a small fraction of pixels from an image satisfying the desired property. Tolerant testing was introduced by Parnas, Ron, and Rubinfeld [31] exactly with this goal in mind-to deal with noisy objects. It builds on the property testing model and calls for algorithms that accept objects that are close to having a desired property and reject objects that are far. Another related task is approximating distance of a given object to a nearest object with the property within additive error ϵ. (Distance approximation algorithms imply tolerant testers with similar query complexity and running time and vice versa: see the remark after Definition 2. 2.) The only image problem for which tolerant testers were studied is the image partitioning problem investigated by Kleiner et al. [27].

Our Results
We design efficient approximation algorithms for the following fundamental questions: What fraction of pixels have to be changed in an image so it becomes a half-plane? A representation of a convex object? A representation of a connected object? In other words, we design algorithms that approximate the distance to being a half-plane, convexity and connectedness within a small additive error or, equivalently, tolerant testers for these properties. These problems were not investigated previously in the tolerant testing framework. For all three properties, we give ϵ-additive distance approximation algorithms that run in constant time (i.e., dependent only on ϵ, but not the size of the image). We remark that even though it was known that these properties can be tested in constant time [32], this fact does not necessarily imply constant-query tolerant testers for these properties. E.g., Fischer and Fortnow [20] exhibited a property (of objects representable with strings of length n) that is testable with a constant number of queries, but for which every tolerant tester requires n Ω(1) queries. (Subsequent to the publication of the conference version of our article [8], this separation result was improved in a line of work [6,18,34] on separating property To get complexity of (ϵ 1 , ϵ 2 )-tolerant testing, substitute ϵ = (ϵ 2 − ϵ 1 )/ 2. testing and tolerant testing from the intermediate model called erasure-resilient testing. Specifically, Ben-Eliezer et al. [6] gave a property that can be tested with the number of queries that is a function only of ϵ, but for which every tolerant tester requires Ω(n/ log n) queries.) For convexity and connectedness, even the existence of distance approximation algorithms with query (or time) complexity independent of the input size does not follow from previous work. It does not follow from the VC-dimension bounds, since VC dimension of convexity and connectedness, even in two dimensions, depends on the input size. 1 Implications of the VC dimension bound on convexity are further discussed below.
Our results on distance approximation are summarized in Table 1. We remark that our algorithms for half-plane and convexity also work for the continuous versions of the problems, that is, when we are given a continuous figure instead of a discrete image. (See the definition of a figure in Section 2.) Our algorithm for convexity is the most important and technically difficult of our results, requiring a large number of new ideas to get running time polynomial in 1/ϵ. To achieve this, we define a set of reference polygons P ϵ such that (1) every convex image has a nearby polygon in P ϵ and (2) one can use dynamic programming to quickly compute the smallest empirical distance to a polygon in P ϵ . It turns out that the empirical error of our algorithm is bounded from above by a quantity proportional to the sum of the square roots of the areas of the regions it considers in the dynamic program. To guarantee (2) and keep our empirical error small, our construction ensures that the sum of the square roots of the areas of the considered regions is small. This construction might be of independent interest.
Our algorithms do not need sophisticated access to the input image: uniformly randomly sampled pixels suffice for our algorithms for the half-plane property and convexity. For connectedness, we allow our algorithms to query pixels from a uniformly random block. (See the end of Section 2 for a formal specification of the input access.) Our algorithms for convexity and half-plane work by first implicitly learning the object. 2 PAC learning was defined by Valiant [40], and agnostic learning, by Kearns et al. [25] and Haussler [23]. As a corollary of our analysis, we obtain fast proper agnostic PAC learners of half-planes and of convex sets in two dimensions that work under the uniform distribution. The sample and time complexity of the PAC learners is as indicated in Table 1 for distance approximation algorithms 1 For n × n images, the VC dimension of convexity is Θ(n 2/3 ) (this is the maximum number of vertices of a convex lattice polygon in an n × n lattice [2]); for connectedness, it is Θ(n). 2 There is a known implication from learning to testing. As proved in Reference [21], a proper PAC learning algorithm for property P with sampling complexity q (ϵ ) implies a two-sided error (uniform) property tester for P that takes q (ϵ /2) + O (1/ϵ ) samples. There is an analogous implication from proper agnostic PAC learning to distance approximation with an overhead of O (1/ϵ 2 ) instead of O (1/ϵ ). We choose to present our testers first and get learners as corollary, because our focus is on testing and because using the generic reduction for this case is an overkill. for corresponding properties when the error probability δ = 1/3. The results for general δ are stated in Corollaries 3.5 and 4. 21.
While the sample complexity of our agnostic half-plane learner (and hence our distance approximation algorithm for half-planes) follows from the VC dimension bounds, its running time does not. Agnostically learning half-spaces under the uniform distribution has been studied by Reference [24], but only for the hypercube {−1, 1} d domains, not the plane. Our PAC learner of convex sets, in contrast to our half-plane learner, beats VC dimension lower bounds on sample complexity. (The sample complexity of a PAC learner for a class is at least proportional to the VC dimension of that class [19].) Since VC dimension of convexity of n × n images is Θ(n 2/3 ), proper PAC learners of convex sets in two dimensions (that work under arbitrary distributions) must have sample complexity Ω(n 2/3 ). However, one can do much better with respect to the uniform distribution. Schmeltz [38] showed that a non-agnostic learner for that task needs Θ(ϵ −3/2 ) samples.
Surprisingly, it appears that this question has not been studied at all for agnostic learners. Our agnostic learner for convex sets in two dimensions under the uniform distribution needs O ( 1 ϵ 2 log 1 ϵ ) samples and runs in time O ( 1 ϵ 8 ). For connectedness, we take a different approach. Our algorithm does not try to learn the object first; instead it relies on a combinatorial characterization of distance to connectedness. We show that distance to connectedness can be represented as an average of distances of sub-images to a related property.
Our algorithm uses an algorithm for computing the distance to connectedness (exactly) as a subroutine (that is run on images with side length O (1/ϵ )). The exact algorithm is presented in Section 5. 3. Its running time is 2 O (n) . Note that there is a straightforward algorithm that runs in time 2 O (n 2 ) for exact computation: It goes over all images M , checks whether M is connected, and returns the minimum distance from M to a connected M . Our algorithm's running time has O (n) in the exponent instead of O (n 2 ). Recently, Ben-Eliezer et al. [4] showed that the problem of computing the distance to connectedness is NP-hard, thus demonstrating that our exact algorithm cannot be improved significantly without a major breakthrough.
Finally, we show that the query complexity of our algorithms for approximating the distance to the nearest half-plane and to convexity is optimal up to a factor logarithmic in 1/ϵ. The proofs of the two lower bounds rely on the standard fact that Ω(1/ϵ 2 0 ) coin tosses are needed to distinguish between a fair coin and a coin that lands heads with probability 1/2 +ϵ 0 . The lower bounds are not limited to algorithms that sample input pixels uniformly at random. They apply to to all algorithms that query input pixels.

Comparison to Other Related Work
Property testing has rich literature on graphs and functions, however, prior to our work, properties of images have been investigated very little. Even though superficially the inputs to various types of testing tasks might look similar, the problems that arise are different. In the line of work on testing dense graphs, started by Goldreich et al. [21], the input is also an n ×n binary matrix, but it represents an adjacency matrix of the dense input graph. So, the problems considered are different than in this work. In the line of work on testing geometric properties, started by Czumaj, Sohler, and Ziegler [17] and Czumaj and Sohler [16], the input is a set of points represented by their coordinates. The allowed queries and the distance measure on the input space are different from ours. In the line of work on testing convexity (and submodularity) of high-dimensional functions [3,11,13,30,35,39], convexity is defined differently than for geometric objects.
A line of work potentially relevant for understanding connectedness of images is on connectedness of bounded-degree graphs. Goldreich and Ron [22] gave a tester for this property, subsequently improved by Berman et al. [11]. Campagna et al. [15] gave a tolerant tester for this problem. Even though we view our image as a graph to define connectedness of images, there is a significant difference in how distances between instances are measured (see Reference [32] for details). We also note that, unlike in Reference [15], our tolerant tester for connectedness is fully tolerant, i.e., it works for all settings of parameters.
The only previously known tolerant tester for image properties was given by Kleiner et al. [27]. They consider the following class of image partitioning problems, each specified by a k × k binary template matrix T for a small constant k. The image satisfies the property corresponding to T if it can be partitioned by k − 1 horizontal and k − 1 vertical lines into blocks, where each block has the same color as the corresponding entry of T . Kleiner et al. prove that O (1/ϵ 2 ) samples suffice for tolerant testing of image partitioning properties. Note that VC dimension of such a property is O (1), so by Footnote 2, we can get a O (ϵ −2 log ϵ −1 ) bound. Our algorithms required numerous new ideas to significantly beat VC dimension bounds (for convexity and connectedness) and to get low running time.
For the properties we study, distance approximation algorithms and tolerant testers were not investigated previously. In the standard property testing model, the half-plane property can be tested in O (ϵ −1 ) time [32], convexity can be tested in O (ϵ −4/3 ) time [10], and connectedness can be tested in O (ϵ −2 log ϵ −1 ) time [11,32]. As we explained, property testers with running time independent of ϵ do not necessarily imply tolerant testers with that feature. Many new ideas are needed to obtain our tolerant testers. In particular, the standard testers for half-plane and connectedness are adaptive while the testers here need only random samples from the image, so the techniques used for analyzing them are different. The tester for convexity in Reference [10] uses only random samples, but it is not based on dynamic programming.
We note that, since the publication of the conference version of our article, it has been shown that convexity can be tested (in the standard property testing model) in O (1/ϵ ) time, with an adaptive tester [9]. Also, testing of convexity in high dimensions has been investigated [12].
Finally, general classes of properties that include earthmover resilient properties [5], hereditary properties [1], and classes of matrices that are free of specified patterns [7] have been studied, but the new testers that apply to these general classes do not improve on the existing (even nontolerant) testers for the three properties considered in this work.

Open Questions
In this article, we give tolerant testers for several important problems on images. It is open whether the running time of these testers can be improved. We showed that our testers for half-plane and convexity are nearly optimal in terms of query complexity (up to a logarithmic factor in 1/ϵ). But it is open whether the query complexity of our connectedness tester can be improved.

Organization
We give formal definitions and notation in Section 2. Algorithms for being a half-plane, convexity, and connectedness are given in Sections 3, 4, and 5, respectively. The sections presenting algorithms for being a half-plane and convexity start by giving a distance approximation algorithm and conclude with the corollary about the corresponding PAC learner.
Image representation. We focus on black-and-white images. For simplicity, we only consider square images, but everything in this article can be easily generalized to rectangular images. We represent an image by an n × n binary matrix M of pixel values. We index the matrix by [0..n) 2 Distance to a property. The absolute distance, Dist (M 1 , M 2 ), between matrices M 1 and M 2 is the number of the entries on which they differ. The relative distance between them is dist (M 1 , M 2 ) = Dist (M 1 , M 2 )/n 2 . A property P is a subset of binary matrices. The distance of an image represented by matrix M to a property P is dist (M, P) = min M ∈P dist (M, M ). (We also sometimes use the absolute distance to a property, i.e., Dist (M, P) = dist (M, P) · n 2 .) An image is ϵ-far from the property if its distance to the property is at least ϵ; otherwise, it is ϵ-close to it.
Computational Tasks. We consider several computational tasks: tolerant testing [31], additive approximation of the distance to the property, and proper (agnostic) PAC learning [23,25,40]. Here, we define them specifically for properties of images.

Definition 2.2 (Distance Approximation Algorithm).
An ϵ-additive distance approximation algorithm for a property P is a randomized algorithm that, given an error parameter ϵ ∈ (0, 1/2) and access to an n × n binary matrix M, outputs a valued ∈ [0, 1] that with probability at least 2/3 satisfies |d − dist (M, P)| ≤ ϵ.
As observed by Parnas, Ron, and Rubinfeld [31], we can obtain an (ϵ 1 , ϵ 2 )-tolerant tester for any property P by running a distance approximation algorithm for P with ϵ = (ϵ 2 − ϵ 1 )/2. Thus, all our distance approximation algorithms directly imply tolerant testers. In Reference [31,Claim 2], it is also shown that every tolerant tester that works for ϵ 2 − ϵ 1 = ϵ can be transformed into a distance approximation algorithm for the same property with sample complexity and running time increase by a factor of O (log 1 ϵ ·log log 1 ϵ ). Thus, our lower bounds for distance approximation directly implyΩ( 1 ϵ 2 ) lower bounds for tolerant testing of the half-plane property and convexity, where theΩ notation hides polylogarithmic factors in 1 ϵ . Remark 2. 1. Note that, for every property, an algorithm that always outputs 1/2 approximates the distance of an image to the property with ϵ-additive error when ϵ ≥ 1/2. Thus, one can assume that ϵ ∈ (0, 1/2). Moreover, in this work, we assume that ϵ ∈ (0, 1/4). Every property we consider is satisfied by a unicolored image. For an image M, let M u be a unicolored image closest to M. Note that, in every image, the fraction of all white pixels is at most 1/2 or the fraction of all black pixels is at most 1/2. Thus, for every property P considered in this work, dist (M, P) ≤ dist (M, M u ) ≤ 1/2. If ϵ ≥ 1/4, then there is a trivial ϵ-additive distance approximation algorithm for dist (M, P) that always outputs 1/4. Therefore, we only consider the case when ϵ ∈ (0, 1/4).

Definition 2.3 (Proper Agnostic PAC Learner).
A proper agnostic PAC learning algorithm for class P that works under the uniform distribution is given a parameter ϵ ∈ (0, 1/2) and access to an image M. It can draw independent uniformly random samples (i, j) and obtain (i, j) and M [i, j]. With probability at least 2/3, it must output an image M ∈ P such that dist (M, M ) ≤ dist (M, P) + ϵ.
Access to the input. A query-based algorithm accesses its n × n input matrix M by specifying a query pixel (i, j) and obtaining M[i, j]. The query complexity of the algorithm is the number of pixels it queries. A query-based algorithm is adaptive if its queries depend on answers to previous queries and nonadaptive otherwise. A uniform algorithm accesses its n × n input matrix by drawing independent samples (i, j) from the uniform distribution over the domain (i.e., [0..n) 2 ) and obtaining (i, j) and M[i, j] for each sample. A block-uniform algorithm accesses its n × n input matrix by specifying a block length r ∈ [n]. For a block length r of its choice, the algorithm draws x, y ∈ [ n/r ] uniformly at random and obtains set {(i, j) | i/r = x and j/r = y} and M[i, j] for all (i, j) in this set. The sample complexity of a uniform or a block-uniform algorithm is the number of pixels of the image it examines.
Black-and-white figures. In Sections 3 and 4, we will look at continuous versions of the objects we are studying. Let U = [0, n) 2 be the universe. A (black-and-white) figure is defined by a set C ⊆ U such that all points in C are black and all points outside C are white. We can think of a black object C on a white background U \ C. For any region R, we use A(R) to denote its area. For two figures F 1 and F 2 , let SD(F 1 , F 2 ) denote their symmetric difference and let dist (F 1 , F 2 ) = A(SD(F 1 , F 2 ))/n 2 . Distance to a property of figures is defined analogously to the one for images.
Poisson algorithms. Uniform algorithms have access to independent (labeled) samples from the uniform distribution over the domain. Sometimes it is easier to analyze Poisson algorithms that only have access to (labeled) Poisson samples from the image: Namely, q = Po(s) pixels are drawn from the image uniformly and independently at random, where s is the sample parameter and Po(s) is a Poisson random variable with expectation s. The Poisson distribution with parameter λ ≥ 0, denoted Po(λ), generates each value x ∈ N with probability e −λ λ x /x!. The expectation and variance of a random variable distributed according to Po(λ) are both λ.
Definition 2. 4. A Poisson-s tester is a uniform tester that takes a random number of samples distributed as Po(s).

DISTANCE APPROXIMATION TO THE NEAREST HALF-PLANE IMAGE
An image is called a half-plane if there exist an angle φ ∈ [0, 2π ) and a real number c such that every pixel (x, y) is black in the image iff x cos φ + y sin φ ≥ c. The line x cos φ + y sin φ = c, denoted L φ c , is a separating line of the half-plane image, i.e., it separates black-and-white pixels of the image. We call φ the direction of the half-plane image (and of the line L φ c ). Note that φ is the oriented angle between the x-axis and a line perpendicular to L at random and outputs the empirical distance to the closest reference half-plane image. The core property of M ϵ is that the smallest empirical distance to a half-plane image in M ϵ can be computed quickly. In other words, for every reference direction, we space separating lines of reference half-plane images distance a apart. By definition, there are at most √ 2n/a = 2/ϵ reference half-plane images for each direction in D ϵ and, consequently, (1) ALGORITHM 1: Distance approximation to being a half-plane. input : parameters n ∈ N, ϵ ∈ (24/n, 1/4); uniform access to an n × n binary matrix M. 1 Sample a set S of s = 10 ϵ 2 ln 9 ϵ pixels uniformly at random with replacement. 2 Let D ϵ , M ϵ be the sets of reference directions and half-planes, respectively (see Definition 3.1), and a = ϵn Assign each sample (x, y) ∈ S to bucket j = (x cos φ + y sin φ)/a .

5
For each bucket j, compute w j and b j , the number of white and black pixels it has. 6 For Outputd, the minimum of the values computed in Step 6.
Proof. We define (continuous) half-plane figures and the set of reference half-plane figures corresponding to the set M ϵ .

Definition 3.2 (Half-planes and Reference Half-planes). For all
Recall that A(R) denotes the area of a region R and SD(F 1 , F 2 ) denotes the symmetric difference of figures F 1 and F 2 .

Proof. Consider a half-plane H
Hence, the sum of their areas is at most ϵn 2 /2.
If exactly one of the regions is a quadrilateral, as illustrated in Figure 3 Applying the same reasoning as before to each pair of regions, we get that the sum of their areas is at most φ 1 n 2 + φ 2 n 2 ≤ ϵn 2 /2. If both regions are quadrilaterals, then we add a line for each of them and apply the same reasoning as before to the three resulting pairs of regions. Again, the area of the symmetric difference of H    [14]), since s ≥ (5/ϵ 2 )(ln |M ϵ | + ln 6) for all ϵ ∈ (0, 1/4), we get that with probability at least 2/3, and modify the algorithm to output, along withd = min M ∈M ϵd (M ), a reference half-planeM for whichd (M ) =d. Using the fact that s ≥ (5/ϵ 2 )(ln |M ϵ | + ln 2 δ ), a uniform convergence bound, and the rest of the analysis of Algorithm 1, we obtain that with probability at least 1 − δ , the outputM satisfies dist (M,M ) ≤ d M + ϵ.

DISTANCE APPROXIMATION TO THE NEAREST CONVEX IMAGE
An image is convex if the convex hull of all black pixels contains only black pixels. In this section, we give an algorithm that approximates the distance of an image to being convex. Its performance is summarized in the following theorem:   2) is similar to that of Algorithm 1 that approximates the distance to being a half-plane. We define a set P ϵ of convex reference polygons. Algorithm 2 implicitly learns a nearby reference polygon and outputs the empirical distance from the input figure to that reference polygon. The key features of P ϵ are that (1) every convex figure has a nearby polygon in P ϵ , and (2) one can use dynamic programming (DP) to quickly compute the smallest empirical distance to a polygon in P ϵ . Even though the number of reference polygons is large, all of them are composed of special regions. The special regions are triangles and quadrilaterals, and the total number of them is polynomial in 1/ϵ. The main idea of the dynamic programming algorithm is to recursively partition the reference polygons into regions for which the decisions about the best coloring that fits the input can be made independently.
We start by defining reference directions, lines, points, and line-point pairs that are later used to specify our DP instances. Reference directions are almost the same as in Definition 3.1. See Figure 4. 1. For simplicity, we assume that 1/ϵ is an integer; otherwise, we run Algorithm 2 with a slightly smaller ϵ that satisfies the requirement. This does not change the asymptotic complexity of the algorithm. where is a reference line and b is a reference point w.r.t. . (Note that there could be reference points on that were defined w.r.t. some other reference line. This is why we say "a reference point w.r.t. , " and not "a reference point on . ") Roughly speaking, a reference polygon represents a polygon whose vertices are defined by linepoint pairs. There are additional restrictions that stem from the fact that we need to be able to efficiently find a polygon close to an input figure. The actual definition specifies which actions we can take while constructing a reference polygon. Reference polygons are built starting from reference boxes, which are defined next. See   distinct vertical lines, such that 1 is to the left of 3 . Each reference box defines a vertex set

Definition 4.1 (Reference Lines, Line-point Pairs).
Note that line-point pairs do not depend on the input. Intuitively, by picking a reference box, we decide to keep the area inside the quadrilateral b 0 b 1 b 2 b 3 black, the area outside the rectangle formed by 0 , 1 , 2 , 3 white, and the triangles in T 0 gray, i.e., undecided for now. Reference polygons are defined next. Intuitively, to obtain a reference polygon from a reference box, we keep subdividing "gray" triangles in T 0 into smaller triangles and deciding to color the smaller triangles black or white or keep them gray (i.e., undecided for now). We also allow "cutting off" a quadrilateral that is adjacent to black and coloring it black (a.k.a. "the base change operation"). The main recoloring operation from Definition 4.4 is illustrated in Figure 4. 5. Even though the definition of reference polygons is somewhat technical, the readers can check their understanding of this concept by following Algorithm 2, as it chooses the best reference polygon to approximate the input figure.

Definition 4.4 (A Reference Polygon).
A reference polygon is a figure defined by Hull(B), where the set B can be obtained from a reference box with a vertex set B 0 and a triangle set T 0 by the following recursive process. Initially, T end = ∅ and B = B 0 . While T 0 ∅, move a triangle T from T 0 to T end and perform the following steps: (2) (Subdivision Step) If h > 5ϵ 0 n, then choose whether to proceed with this step or go directly to Step 3 (both choices correspond to a legal reference polygon); otherwise, go directly to Step 3.
Let φ be the angle between (b 0 , b 0 ) and the x-axis, andφ ∈ D ϵ be such that |φ − φ| ≤ ϵ 0 /2.  The set of all reference polygons is denoted P ϵ .
Observe that, for every reference polygon, at any time during its construction, the set B consists of points in the convex position. Moreover, each triangle stored in T 0 contains points that can be added to B without violating this invariant. This allows us to make coloring decisions for each triangle of T 0 independently. By Remark 2.3, to prove Theorem 4.2, it suffices to design a Poisson algorithm that, in expecta- Our Poisson algorithm is Algorithm 2. In Algorithm 2, we use the notation for the (relative) empirical error defined next.  input : parameters n ∈ N, ϵ ∈ (32/n, 1/4); uniform sample access to a figure F . 1 Set s = Θ( 1 ϵ 2 log 1 ϵ ). Sample Po(s) points from F uniformly and independently at random. // Next computed, the smallest fraction of samples misclassified by a reference polygon in P ϵ . A dynamic programming implementation of the steps below is given in Section 4.3.
Similarly to Steps 5-8, computed right . // The best error for the region to the right of b 0 b 2 , between 0 and 2 .
Our set of reference polygons has several crucial features. First, for each convex figure there is a nearby reference polygon. This is proved in Section 4. 1. It turns out that the empirical error for a region is bounded from above by a quantity proportional to the square root of the area of that region. The second key feature of our reference polygons is that, for each of them, the set of considered triangles, T end (in the construction of the reference polygon), has small T ∈T end A(T ).  The proof of this fact, as well as the analysis of the empirical error, appears in Section 4.2. Finally, Section 4.3 completes the analysis of the algorithm, gives details of its implementation (including the use of dynamic programming to quickly compute the smallest empirical distance to a reference polygon), explains how to adapt it to use for images instead of figures, and presents the corollary about agnostic PAC learning of convex objects. Proof. Consider a convex figure F . We will show how to construct a reference polygon P close to F by following the recursive process in Definition 4.4 and specifying how to instantiate choices in the process. The construction of P is described next.

Existence of a Nearby Reference Polygon
First, we obtain a reference box (see Next, we construct the set B from the reference box, as in Definition 4. 4. We also maintain two sets of triangles, T cut and T fin , and two sets of line segments, L cut and L fin , that are used in the analysis. Initially, T cut = T fin = L cut = L fin = ∅. The colors of the points in the description below are with respect to the figure F . The boundary of F is the boundary of the closure of the set of black points in F . This is how we make the choices at each step of the recursive process in Definition 4.4 to obtain our reference polygon and how we construct L cut and L fin : (1) (Base Change) Choose b 0 , b 0 to be the black reference points w.r. t. lines (b , v) and  At the end of the process, we obtain a reference polygon P. Observe that each segment in L cut ∪ L fin has both endpoints on the boundary of F . Moreover, if we list all endpoints of segments in L cut ∪ L fin in the order they appear on the boundary (say, clockwise), then the two endpoints of each segment will appear next to each other in this list. Proof. Let 0 , 1 , 2 , 3 be the lines defining the reference box of P. Recall the definition of regions W i from Step 2 of Algorithm 2. Let C i , for i = 0, 1, 2, 3, be the closure of the set of points in Observe that C 0 is convex because it is the intersection of two convex sets. By definition of 0 , the set C 0 contains no reference points w.r.t. horizontal reference lines, that is, no points with both coordinates divisible by ϵ 0 n. We rescale F by mapping each point (x, y) to (x/(ϵ 0 n), y/(ϵ 0 n)) and apply the Invisibility Lemma (Fact 4.9) with m = 1/ϵ 0 , to obtain that A(C 0 ) ≤ m · (ϵ 0 n) 2 = ϵ 0 n 2 . (Recall that 1/ϵ is an integer, thus, 1/ϵ 0 is also an integer.) Applying the same argument, we get that A(C i ) ≤ ϵ 0 n 2 for i ∈ [3] as well. Therefore, Err (R out ) ≤ 4ϵ 0 n 2 .
Next, we show that the error in every triangle in T fin is small. See  is at most h. Consequently,  Observe that all triangles in T fin are right or obtuse. Let ∈ Lφ be as defined in Fact 4.7, with an additional condition that is closest to b 0 b 0 among all lines that fit the definition in Fact 4. 7.
Let¯ be the line parallel to that passes through the point b 0 , as shown in Figure 4. 11. Let u be the intersection point of¯ and the line segment b 0 v.
We bound the error as follows:  Recall that y y contains no reference point w.r.t. , since b 0 b 0 v is in T fin . Consequently, |y y | < ϵ 0 n. Putting this together with Fact 4.7, we get: The distance between and¯ is less than ϵ 0 n. Otherwise, there would be a line that fits the definition in Fact 4.7 and is closer to b 0 b 0 than . Then The second inequality follows because b 0 ub 0 is obtuse. Denote the angle between¯ and b 0 b 0 by γ . Then γ ≤ ϵ 0 /2, by definition of and¯ . By Fact 4.12, It remains to find upper bounds on the quantities obtained in inequalities (3)-(5) in terms of |a a |. Since, by definition of b 0 , the segment b 0 a contains no reference points w.r. t. (b 0 , v), we get that |b 0 a | < ϵ 0 n. Similarly, |b 0 a | < ϵ 0 n. By the triangle inequality, |b 0 b 0 | < |a a | + 2ϵ 0 n. By Fact 4.11, since h > 5ϵ 0 n, we get that |b 0 b 0 | > 10ϵ 0 n and thus |a a | ≥ |b 0 b 0 | − 2ϵ 0 n > 8ϵ 0 n. Putting together inequalities (2)-(5) and then using the facts that |b 0 b 0 | < |a a | + 2ϵ 0 n and that ϵ 0 n < |a a |/8, we obtain: This completes the proof of Proposition 4.6.
Next, we show that the error in every triangle in T cut is small. Proof. If |y y | ≤ ϵ 0 n, then, by Fact 4.7, Now assume that |y y | > ϵ 0 n. Let ∈ Lφ be the line at distance ϵ 0 n from , closer to v than , as shown in Let x and x be the points of intersection of the boundary of F with . Then |x x | ≤ ϵ 0 n, since, by construction of , the segment x x contains no reference point w.r.t. . By inequality (6) and Fact 4.7, The last inequality holds, since |y y | > ϵ 0 n. This completes the proof of Proposition 4. 8.
Figures F and P differ only on regions outside the reference box and inside triangles in T cut ∪T fin (intuitively "above" the line segments in L cut ∪ L fin ). All the line segments in L cut ∪ L fin are the sides of a convex polygon in an n × n square. Thus, the sum of their lengths is at most 4n. By We have shown that, for every convex figure F , there exists a reference polygon P ∈ P ϵ such that dist (F , P ) ≤ ϵ/4, completing the proof of Lemma 4.3.  In other words, T (S ) is a set that consists of all the points (on and between the lines a and b ) that are on or below the curve h(

Geometric Facts Used in Section 4.1.
Since S is closed, so is S . To prove that S is convex, it is enough to show that h(x ) is concave on [a, b]. Recall that f (x ) is concave and д(x ) is convex on [a, b]. Therefore, −д(x ) is concave, and h(x ), the sum of two concave functions, is also concave on [a, b].
To see that (a, 0) ∈ S , observe that (a, д(a)) ∈ S.    T be a triangle with the sides a, b, and c. Let α be the angle opposite to the side a, and h a be the height w.r.t. the base a in T . If α ≥ π /2, then h a ≤ a/2.

Proof. By the cosine theorem,
Thus, h a ≤ a/2, as claimed.

Error Analysis
We start this section by proving the second crucial property of our set of reference polygons in Lemma 4. 13. After that, we prove the main result of this section, Lemma 4.16, which demonstrates that our sample, with high constant probability, is sufficiently accurate for all reference polygons.
Lemma 4. 13. For each set T end obtained in the construction of a reference polygon in Definition 4.4, Proof. All triangles in T end are obtained by partitioning the four initial triangles in T 0 . The following claim analyzes how the area is affected by one step of partitioning.

Claim 4.14. Let T and T be two gray triangles obtained from a triangle T in the Subdivision
Step of Definition 4. 4

Proof. Observe that A(T ) + A(T ) is maximized when
, we prove the lemma for this case. We use notation from

Then,
Let h be the height of v 1 bv from v to the side v 1 b. Let α = ∠v 1 bv . By the construction of T , the angle ∠b 0 vb 0 is either right or obtuse. Therefore, ∠v 1 v b is obtuse and |v b | ≤ |v 1 The last inequality holds because α ≤ ϵ 0 /2 and a ≤ √ 2n (since a is the length of a segment inside an n × n square). It also uses the fact that sin α ≤ α for α ≥ 0. Observe that A(T ) = A(T 1 ) + A( v 1 bv ) and A(T ) ≤ A(T 2 ). By Equations (7) and (8), Since the geometric mean of two numbers is at most their arithmetic mean,
By Proposition 4.15 and concavity of the square root function, This completes the proof of Claim 4.14.
Let A 1 , . . . , A 4 be the areas of the first four triangles in T 0 . Then 4 i=1 A i ≤ n 2 . By construction of triangles in T end , Claim 4.14, and concavity of the square root function, 5. This completes the proof of Lemma 4. 13.
Recall that our algorithm samples Po(s) points, where s is the parameter of the algorithm, uniformly and independently at random from an input figure F and computes the minimum empirical distance to a reference polygon in P ϵ . Let S be the set of samples obtained by the algorithm. Recall that for any black-and-white figure F , we definedd (F ) = 1 s · |{u ∈ S : Proof. Consider a region R = (R − , R + ), partitioned into two regions R − and R + , such that in some step of Algorithm 2 or a subroutine called by it, we are checking the assumption that R − is white and R + is black, i.e., evaluatingd − (R − ) +d + (R + ). Specifically, this is done in Step

Proof. Consider the random variable
where α = cϵn/ A(Γ). If α ≥ 1, then, since X is always at least 0, we obtain that Moreover, by Chernoff bounds for Poisson random variables, Thus, and the claim holds. Now assume that α < 1. By Chernoff bounds for Poisson random variables,   ways to choose a reference line and two reference points. Overall, the number of quadrilaterals in R is at most π 2 k 8 .
Finally, regions of type 4 are contained in triangles of the form b 0 b 0 v; they are of the form In the former two cases, regions are defined by two line-point pairs ( (b 0 , v) By taking a union bound over all regions in R and applying Claims 4.17-4.18, we get that the probability that for one or more of them the error is larger than stated in Claim 4.17 is at most where the last inequality holds provided that s ≥ C 1 ϵ 2 ln 1 ϵ for some sufficiently large constant C. We get that Now suppose that event in inequality (10) holds, that is, the error is low for all regions. Fix a reference polygon P and consider its partition into regions R = (R + , R − ) ∈ R on which the Poisson algorithm evaluatesd + (R + ) +d − (R − ) while implicitly computingd (P ). Let R P ⊂ R be the set of regions in the partition. Recall the four types of regions from the proof of Claim 4. 18. Then R P contains one region of type 1 and two regions of type 2, defined by the reference box of P. Denote their areas by A 1 , A 2 , A 3 . For each triangle T ∈ T end created during the construction of P in Definition 4.4, the set R P contains at most one region of type 3 (a black quadrilateral) and at most one region of type 4. They were implicitly colored, respectively, in Item 1 and Items 2-3 of Definition 4.4, when triangle T was processed. Let A T and A T denote their respective areas.
Note that the upper bound on the area of the misclassified set in a region R is A(R). Since the event in inequality (10) holds, Since 3 j=1 A j ≤ n 2 and A T + A T ≤ A(T ) for all T ∈ T end , by concavity of the square root function, We substitute these expressions into inequality (11), use Lemma 4.13, and recall that c = 1/24: This holds for all reference polygons P as long as the event in inequality (10) happens, i.e., with probability at least 2/3. This completes the proof of Lemma 4.16.

Analysis of Accuracy of Algorithm 2.
Let d F be the distance of figure F to convexity. Then there exists a convex figure F * such that dist (F , F * ) = d F . By Lemma 4.3, there is a reference polygon P * such that dist (F * , P * ) ≤ ϵ/4. By the triangle inequality, Algorithm 2 returns an estimated = min P ∈P ϵd (P ), that is, the smallest fraction of samples misclassified by a reference polygon. By Lemma 4.16, with probability at least 2/3 over the choice of the samples taken by Algorithm 2, |d (P ) − dist (F , P )| ≤ 3ϵ/4 for all reference polygons P ∈ P ϵ . Suppose this event happened. Then Now consider a reference polygonP for whichd =d (P ). For this polygon, The last inequality holds because the distance from F to any convex figure, includingP, is at least d F . As required, |d − d F | ≤ ϵ with probability at least 2/3.

Sample and Time Complexity of Algorithm 2.
The number of samples taken by the algorithm is the Poisson random variable with parameter s = O (ϵ −2 log ϵ −1 ), so the expected number of samples is O (ϵ −2 log ϵ −1 ). By standard arguments, the algorithm can be converted to an algorithm that takes O (ϵ −2 log ϵ −1 ) samples in the worst case and has the same accuracy guarantee.
Next, we explain how to implement the resulting algorithm to run in time O (ϵ −8 ). Refer to Figure 4. 5

. Each instance triangle b b v of dynamic programming in subroutines
BestForFixedBase and Best is specified by two line-point pairs : ( (b , v), b ), ( (b , v) Observe that Algorithms 2 and 5 have the same query and time complexity. By Theorem 4.2, with probability at least 2/3, Algorithm 2 approximates the distance of M to convexity within additive error (7/8) ·ϵ. Next, we show that, with probability at least 2/3, Algorithm 5 approximates the distance of M to convexity within additive error ϵ.
Recall that d F denotes the distance from a figure F to convexity. Analogously, we define d M for an image M.   Observe that F * and M F * differ only on the unit squares that contain points from the boundary of F * . The next claim is paraphrased from Reference [32, Lemma 1]. See Reference [32] for the proof.
Since |d − d M | ≤ (7/8) · ϵ with probability at least 2/3, we get that with probability at least 2/3, where we used the fact that ϵ ≥ 32/n. This concludes the proof of Theorem 4.1. Proof. We set the sample size in Algorithm 2 to s = C · 1 ϵ 2 ln 1

Agnostic Learning of Convexity.
where C is a sufficiently large constant, and modify Algorithm 2 to output, along withd, a reference polygon P withd (P ) =d. With an additional DP table, we can compute which points became its vertices. In this case, in the analysis of Algorithm 2, inequality (9) in Lemma 4.16 becomes   Therefore, the phrase "with probability at least 2/3" becomes "with probability at least 1 − δ " throughout the analysis of the algorithm. Thus, with probability at least 1−δ , the outputP satisfies dist (F ,P ) ≤ d F + ϵ. This gives an agnostic learner for convex figures with running time To get an agnostic learner for convex images, we similarly keep an additional DP table while running Algorithm 5.

Border Connectedness
The first idea in our algorithms for connectedness is that we can modify an image in relatively few places by superimposing a grid on it (as shown in  The pixels that do not lie in any squares of S r , i.e., pixels (i, j) where i or j is divisible by r , are called grid pixels. The set of all grid pixels is denoted by GP r . Proof. |GP r | = 2( n−1 r + 1)n − ( n−1 r + 1) 2 < 2( n−1+r r )n ≤ 2( n−1+n−1 r )n = 4( n−1 r )n < 4n 2 /r .
Note that a square consists of pixels of an r -block, with the pixels of the first row and column removed. Therefore, a block-uniform algorithm can obtain a uniformly random r -square.

Definition 5.2 (Border Connectedness).
The border of a k × k (sub)image is the set of pixels A (sub)image S is border-connected if, for every black pixel (i, j) of S, the image graph G S contains a path from (i, j) to a pixel on the border of S. The property border connectedness, denoted C , is the set of all border-connected images.
The absolute distance to border connectedness of a square S is equal to the absolute distance to connectedness of a related square, defined next. 3. Let S be a k × k square. Define S to be a (k + 2) × (k + 2) square such that: Intuitively, S is the image S with a square black frame around it. Next, we show that the absolute distance of a square S to border-connectedness is equal to the absolute distance of S to connectedness. Lemma 5. 3. Dist (S, C ) = Dist (S , C) for every square S. Proof. Consider a k × k square and let S be a border-connected square closest to S. Note that S is a connected square of size (k + 2) × (k + 2) and, consequently, Observe that to get a connected (k + 2) × (k + 2) square from S with as few modifications as possible, the pixels of S on its border should not be modified. This is because connecting a connected component to the border takes at most (k + 2)/2 modifications, but removing the entire border takes 4k + 4 modifications. Therefore, the closest connected square to S has only black pixels on its border. Thus, there is a k × k border-connected square S such that S is a connected image closest to S and, consequently, Therefore, Dist (S, C ) = Dist (S , C), as desired.

Proof of Theorem 5.1
The main idea behind Algorithm 6, used to prove Theorem 5.1, is to relate the distance to connectedness to the distance to another property, which we call grid connectedness. The latter distance is the average over squares of the distances of these squares to border connectedness (see Definition 5.2). The average can be easily estimated by looking at a sample of the squares. In Section 5.3, we give an algorithm that computes the distance of any image to connectedness. We can use that algorithm and Lemma 5.3 to find the distance of a square to border-connectedness. W.l.o.g. assume that n ≡ 1 (mod 8/ϵ ). (Otherwise, we can pad the image with white pixels without changing whether it is connected and adjust the accuracy parameter.) ALGORITHM 6: Distance approximation to connectedness. input :n ∈ N and ϵ ∈ (0, 1/4); block-sample access to an n × n binary matrix M. 1 Sample s = 4/ϵ 2 squares uniformly and independently from S 8/ϵ (see Definition 5.1). // This can be done by drawing random blocks from the 8/ϵ-partition of [0..n) 2 .
2 For each such square S, compute dist (S, C ) = Dist (S , C) · ( 8 ϵ − 1) −2 , where C is border connectedness (see Definitions 5.2 and 5.3) using Algorithm 9 from Section 5.3 with input S . Let d squares be the average of computed distances dist (S, C ).
Finally, observe that to make M ϵ satisfy C ϵ , it is necessary and sufficient to ensure that each square satisfies C . In other words, Since |S 8/ϵ | = ( n−1 8/ϵ ) 2 , the desired expression for d ϵ follows.
The most expensive step in Algorithm 6 is Step 2 where the distance of a square S to border connectedness is calculated. By Theorem 5.5 (see Section 5.3), the running time of this step for one square is exp(O ( 1 ϵ )) and it is called O (1/ϵ 2 ) times. Therefore, the running time of Algorithm 6 is exp (O ( 1  ϵ )), as claimed.

Algorithm for Computing the Distance to Connectedness
Theorem 5. 5. Let M be an n × n image. There is an algorithm that computes Dist (M, C), that is, the absolute distance of M to connectedness, in time exp (O (n)).
Proof. To prove the theorem, we give a dynamic programming algorithm that computes dist (M, C) by processing rows of M one at a time in consecutive order, starting from row 1. Before describing the algorithm in more detail, we define a string called status for each row of M that captures information sufficient for figuring out which completions of the partial image read so far (to a full n × n image) give a connected image. We start with some preliminary definitions and an observation.

Definition 5.5 (1-blocks and Their Indices).
Consider a string r ∈ {0, 1} n . The maximal consecutive runs of 1s in r are called 1-blocks. Let b (r ) denote the number of 1-blocks in r . We index 1-blocks with numbers 1 to b (r ) in increasing order of indices of pixels they contain.
For example, the string 001110011 contains two 1-blocks; the 1-block with three 1s has index 1, the 1-block with two 1s has index 2.
Consider Observation 5. 6. Let j 1 , j 2 , j 1 , j 2 be the indices of 1-blocks of r i such that 1-blocks j 1 and j 2 (respectively, j 1 and j 2 ) are in the same connected component of G i M , but j 1 and j 1 are not. W.l.o.g. assume j 1 < j 2 , j 1 < j 2 , and j 1 < j 1 . Then either j 1 < j 1 < j 2 < j 2 or j 1 < j 2 < j 1 < j 2 . In other words, 1-blocks from different connected components do not cross, i.e., j 1 < j 1 < j 2 < j 2 cannot occur.
For each row r i of M, we define the status string st (r i ) of length b (r i ) that stores the grouping of 1-blocks of r i into connected components in G i M . Based on Observation 5.6, it is sufficient to store, for each 1-block in r i whether it is the first and/or the last 1-block in its connected component among all 1-blocks of r i . The jth character of st (r i ) records this information as follows: Next, we define the status of a row. If a row contains 1-blocks, then its status is the string of statuses of all its 1-blocks. If a row is 0 n , i.e., has no 1-blocks, then its status indicates whether 1s have been encountered in previous rows. (If they have been, then all subsequent rows have to be 0 n to ensure that the entire image is connected; otherwise, 1s are allowed.)  Algorithm 8 gets two consecutive rows, r and r , of an image matrix and the status string of the first row, r . If its input is not consistent with any connected image, then it returns ⊥; otherwise, it returns the status string of the row r . It starts by running Algorithm 7 to obtain a graph G that captures connected components formed by 1-blocks of r . Then it adds new vertices to G to represent 1-blocks of r and adds new edges to G to capture adjacencies between 1-blocks of r and 1-blocks of r . In the end, it computes the connected components of the resulting graph and assigns status symbols for each 1-block of r , as specified in Definition 5. 7.
Finally, we present Algorithm 9, which computes the distance to connectedness. For each row r i of the image matrix M, it computes the following costs: Since the number of 1-blocks in a row is at most n/2 , the number of all possible configurations for a row is at most 2 n · 4 n/2 = O (4 n ). Algorithm 9 computes all costs of the form cost (i + 1, ·, ·) from all costs of the form cost (i, ·, ·) in time 8 n poly(n). In the final step, to compute the absolute

LOWER BOUNDS ON THE QUERY COMPLEXITY
The proofs of our lower bounds on the query complexity of estimating the distance to half-plane and convexity use the following standard fact about distinguishing random coins with different biases: A bias of a coin is the probability that it lands heads. The following theorem is a paraphrased version of Theorem 1.5 in Reference [26]: Theorem 6.1 (Distinguishing Two Coins). Let ϵ 0 , γ ∈ (0, 1/2). Given a coin with bias either 1 2 or 1 2 + ϵ 0 , to determine the bias with probability at least 1 − γ , one needs Ω( log(1/γ ) ϵ 2 0 ) coin tosses.
Theorem 6. 2. Let H be the set of halfplane images and C be the set of convex images. Let ϵ ∈ (0, 1/2) and P ∈ {H, C}. Every algorithm that approximates the distance of an image to P within additive error ϵ (with probability at least 2/3) requires Ω(1/ϵ 2 ) queries.
Proof. Given p ∈ (0, 1/2], we define a distribution D p on n × n images, as follows: In an image from D p , each pixel is black with probability p and white otherwise, independently from other pixels. Lemma 6. 3. Let M be an image drawn from D p . (1) dist (M, H ) ∈ (p − ϵ 1 , p + ϵ 1 ) with probability at least 5 6 , where ϵ 1 = 6 √ ln n n .
Proof. The following claim is used to prove both parts of the lemma: Proof. Let B (respectively, W ) be the set of pixels in region R that are black (respectively, white) in M . Then |W | = ω R (M )·n 2 and |B| = β R (M )·n 2 . For each pixel (i, j) ∈ R, let X i, j be the indicator random variable for the event that pixel (i, j) in M is black. Then by linearity of expectation and since E[X i, j ] = p.