Clustering in Polygonal Domains

Authors: Mark de Berg, Leyla Biabani, Morteza Monemizadeh, and Leonidas Theocharous

Published in: LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

We study various clustering problems for a set D of n points in a polygonal domain P under the geodesic distance. We start by studying the discrete k-median problem for D in P. We develop an exact algorithm which runs in time poly(n,m) + n^O(√k), where m is the complexity of the domain. Subsequently, we show that our approach can also be applied to solve the k-center problem with z outliers in the same running time. Next, we turn our attention to approximation algorithms. In particular, we study the k-center problem in a simple polygon and show how to obtain a (1+ε)-approximation algorithm which runs in time 2^{O((k log(k))/ε)} (n log(m) + m). To obtain this, we demonstrate that a previous approach by Bădoiu et al. [Bâdoiu et al., 2002; Bâdoiu and Clarkson, 2003] that works in ℝ^d, carries over to the setting of simple polygons. Finally, we study the 1-center problem in a simple polygon in the presence of z outliers. We show that a coreset C of size O(z) exists, such that the 1-center of C is a 3-approximation of the 1-center of D, when z outliers are allowed. This result is actually more general and carries over to any metric space, which to the best of our knowledge was not known so far. By extending this approach, we show that for the 1-center problem under the Euclidean metric in ℝ², there exists an ε-coreset of size O(z/ε).

Facility Location in the Sublinear Geometric Model

Authors: Morteza Monemizadeh

Published in: LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)

In the sublinear geometric model, we are provided with an oracle access to a point set P of n points in a bounded discrete space [Δ]², where Δ = n^O(1) is a polynomially bounded number in n. That is, we do not have direct access to the points, but we can make certain types of queries and there is an oracle that responds to our queries. The type of queries that we assume we can make in this paper, are range counting queries where ranges are axis-aligned rectangles (that are basic primitives in database [Srikanta Tirthapura and David P. Woodruff, 2012; Bentley, 1975; Mark de Berg et al., 2008], computational geometry [Pankaj K. Agarwal, 2004; Pankaj K. Agarwal et al., 1996; Boris Aronov et al., 2010; Boris Aronov et al., 2009], and machine learning [Menachem Sadigurschi and Uri Stemmer, 2021; Long and Tan, 1998; Michael J. Kearns and Umesh V. Vazirani, 1995; Michael J. Kearns and Umesh V. Vazirani, 1994]). The oracle then answers these queries by returning the number of points that are in queried ranges. Let {Alg} be an algorithm that (exactly or approximately) solves a problem 𝒫 in the sublinear geometric model. The query complexity of Alg is measured in terms of the number of queries that Alg makes to solve 𝒫. In this paper, we study the complexity of the (uniform) Euclidean facility location problem in the sublinear geometric model. We develop a randomized sublinear algorithm that with high probability, (1+ε)-approximates the cost of the Euclidean facility location problem of the point set P in the sublinear geometric model using Õ(√n) range counting queries. We complement this result by showing that approximating the cost of the Euclidean facility location problem within o(log(n))-factor in the sublinear geometric model using the sampling strategy that we propose for our sublinear algorithm needs Ω̃(n^{1/4}) RangeCount queries. We leave it as an open problem whether such a polynomial lower bound on the number of RangeCount queries exists for any randomized sublinear algorithm that approximates the cost of the facility location problem within a constant factor.

TSP in a Simple Polygon

Authors: Henk Alkema, Mark de Berg, Morteza Monemizadeh, and Leonidas Theocharous

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

We study the Traveling Salesman Problem inside a simple polygon. In this problem, which we call tsp in a simple polygon, we wish to compute a shortest tour that visits a given set S of n sites inside a simple polygon P with m edges while staying inside the polygon. This natural problem has, to the best of our knowledge, not been studied so far from a theoretical perspective. It can be solved exactly in poly(n,m) + 2^O(√nlog n) time, using an algorithm by Marx, Pilipczuk, and Pilipczuk (FOCS 2018) for subset tsp as a subroutine. We present a much simpler algorithm that solves tsp in a simple polygon directly and that has the same running time.

Clique-Based Separators for Geometric Intersection Graphs

Authors: Mark de Berg, Sándor Kisfaludi-Bak, Morteza Monemizadeh, and Leonidas Theocharous

Published in: LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)

Let F be a set of n objects in the plane and let 𝒢^{×}(F) be its intersection graph. A balanced clique-based separator of 𝒢^{×}(F) is a set 𝒮 consisting of cliques whose removal partitions 𝒢^{×}(F) into components of size at most δ n, for some fixed constant δ < 1. The weight of a clique-based separator is defined as ∑_{C ∈ 𝒮}log (|C|+1). Recently De Berg et al. (SICOMP 2020) proved that if S consists of convex fat objects, then 𝒢^{×}(F) admits a balanced clique-based separator of weight O(√n). We extend this result in several directions, obtaining the following results. - Map graphs admit a balanced clique-based separator of weight O(√n), which is tight in the worst case. - Intersection graphs of pseudo-disks admit a balanced clique-based separator of weight O(n^{2/3} log n). If the pseudo-disks are polygonal and of total complexity O(n) then the weight of the separator improves to O(√n log n). - Intersection graphs of geodesic disks inside a simple polygon admit a balanced clique-based separator of weight O(n^{2/3} log n). - Visibility-restricted unit-disk graphs in a polygonal domain with r reflex vertices admit a balanced clique-based separator of weight O(√n + r log(n/r)), which is tight in the worst case. These results immediately imply sub-exponential algorithms for MAXIMUM INDEPENDENT SET (and, hence, VERTEX COVER), for FEEDBACK VERTEX SET, and for q-Coloring for constant q in these graph classes.

Maximum-Weight Matching in Sliding Windows and Beyond

Authors: Leyla Biabani, Mark de Berg, and Morteza Monemizadeh

Published in: LIPIcs, Volume 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021)

We study the maximum-weight matching problem in the sliding-window model. In this model, we are given an adversarially ordered stream of edges of an underlying edge-weighted graph G(V,E), and a parameter L specifying the window size, and we want to maintain an approximation of the maximum-weight matching of the current graph G(t); here G(t) is defined as the subgraph of G consisting of the edges that arrived during the time interval [max(t-L,1),t], where t is the current time. The goal is to do this with Õ(n) space, where n is the number of vertices of G. We present a deterministic (3.5+ε)-approximation algorithm for this problem, thus significantly improving the (6+ε)-approximation algorithm due to Crouch and Stubbs [Michael S. Crouch and Daniel M. Stubbs, 2014]. We also present a generic machinery for approximating subadditve functions in the sliding-window model. A function f is called subadditive if for every disjoint substreams A, B of a stream S it holds that f(AB) ⩽ f(A) + f(B), where AB denotes the concatenation of A and B. We show that given an α-approximation algorithm for a subadditive function f in the insertion-only model we can maintain a (2α+ε)-approximation of f in the sliding-window model. This improves upon recent result Krauthgamer and Reitblat [Robert Krauthgamer and David Reitblat, 2019], who obtained a (2α²+ε)-approximation.

k-Center Clustering with Outliers in the Sliding-Window Model

Authors: Mark de Berg, Morteza Monemizadeh, and Yu Zhong

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

The k-center problem for a point set P asks for a collection of k congruent balls (that is, balls of equal radius) that together cover all the points in P and whose radius is minimized. The k-center problem with outliers is defined similarly, except that z of the points in P do need not to be covered, for a given parameter z. We study the k-center problem with outliers in data streams in the sliding-window model. In this model we are given a possibly infinite stream P = ⟨ p₁,p₂,p₃,…⟩ of points and a time window of length W, and we want to maintain a small sketch of the set P(t) of points currently in the window such that using the sketch we can approximately solve the problem on P(t). We present the first algorithm for the k-center problem with outliers in the sliding-window model. The algorithm works for the case where the points come from a space of bounded doubling dimension and it maintains a set S(t) such that an optimal solution on S(t) gives a (1+ε)-approximate solution on P(t). The algorithm uses O((kz/ε^d)log σ) storage, where d is the doubling dimension of the underlying space and σ is the spread of the points in the stream. Algorithms providing a (1+ε)-approximation were not even known in the setting without outliers or in the insertion-only setting with outliers. We also present a lower bound showing that any algorithm that provides a (1+ε)-approximation must use Ω((kz/ε)log σ) storage.

The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs

Authors: Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S. Muthukrishnan

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Estimating the size of the maximum matching is a canonical problem in graph analysis, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs. * (Insert-Only Streams) We present a one-pass algorithm that takes O(alpha log n) space and approximates the size of the maximum matching in graphs with arboricity alpha within a factor of O(alpha). This improves significantly upon the state-of-the-art tilde{O}(alpha n^{2/3})-space streaming algorithms, and is the first poly-logarithmic space algorithm for this problem. * (Dynamic Streams) Given a dynamic graph stream (i.e., inserts and deletes) of edges of an underlying alpha-bounded arboricity graph, we present an one-pass algorithm that uses space tilde{O}(alpha^{10/3}n^{2/3}) and returns an O(alpha)-estimator for the size of the maximum matching on the condition that the number edge deletions in the stream is bounded by O(alpha n). For this class of inputs, our algorithm improves the state-of-the-art tilde{O}(\alpha n^{4/5})-space algorithms, where the \tilde{O}(.) notation hides logarithmic in n dependencies. In contrast to prior work, our results take more advantage of the streaming access to the input and characterize the matching size based on the ordering of the edges in the stream in addition to the degree distributions and structural properties of the sparse graphs.

Testable Bounded Degree Graph Properties Are Random Order Streamable

Authors: Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, and Christian Sohler

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of local neighborhoods of the vertices on a random order graph stream using constant space. We then show that our approach can also be applied to constant time approximation algorithms for bounded degree graphs in the adjacency list model: As an example, we obtain a constant-space single-pass random order streaming algorithms for approximating the size of a maximum matching with additive error epsilon n (n is the number of nodes). Our result establishes for the first time that a large class of sublinear algorithms can be simulated in random order streams, while Omega(n) space is needed for many graph streaming problems for adversarial orders.

Clustering on Sliding Windows in Polylogarithmic Space

Authors: Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh

Published in: LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

In PODS 2003, Babcock, Datar, Motwani and O'Callaghan gave the first streaming solution for the k-median problem on sliding windows using O(frack k tau^4 W^2tau log^2 W) space, with a O(2^O(1/tau)) approximation factor, where W is the window size and tau in (0,1/2) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade. In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O'Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (alpha,beta)-approximation for metric k-median clustering in the sliding window model, where alpha and beta are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky, which allows us to apply well-known techniques for solving problems in the sliding window model to functions that are not smooth, such as the k-median cost.

