DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.75

In-Range Farthest Point Queries and Related Problem in High Dimensions

Authors: Ziyun Huang and Jinhui Xu

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

Range-aggregate query is an important type of queries with numerous applications. It aims to obtain some structural information (defined by an aggregate function F(⋅)) of the points (from a point set P) inside a given query range B. In this paper, we study the range-aggregate query problem in high dimensional space for two aggregate functions: (1) F(P ∩ B) is the farthest point in P ∩ B to a query point q in ℝ^d and (2) F(P ∩ B) is the minimum enclosing ball (MEB) of P ∩ B. For problem (1), called In-Range Farthest Point (IFP) Query, we develop a bi-criteria approximation scheme: For any ε > 0 that specifies the approximation ratio of the farthest distance and any γ > 0 that measures the "fuzziness" of the query range, we show that it is possible to pre-process P into a data structure of size Õ_{ε,γ}(dn^{1+ρ}) in Õ_{ε,γ}(dn^{1+ρ}) time such that given any ℝ^d query ball B and query point q, it outputs in Õ_{ε,γ}(dn^ρ) time a point p that is a (1-ε)-approximation of the farthest point to q among all points lying in a (1+γ)-expansion B(1+γ) of B, where 0 < ρ < 1 is a constant depending on ε and γ and the hidden constants in big-O notations depend only on ε, γ and Polylog(nd). For problem (2), we show that the IFP result can be applied to develop query scheme with similar time and space complexities to achieve a (1+ε)-approximation for MEB. To the best of our knowledge, these are the first theoretical results on such high dimensional range-aggregate query problems. Our results are based on several new techniques, such as multi-scale construction and ball difference range query, which are interesting in their own rights and could be potentially used to solve other range-aggregate problems in high dimensional space.

Cite as

Ziyun Huang and Jinhui Xu. In-Range Farthest Point Queries and Related Problem in High Dimensions. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 75:1-75:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ICALP.2022.75,
  author =	{Huang, Ziyun and Xu, Jinhui},
  title =	{{In-Range Farthest Point Queries and Related Problem in High Dimensions}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{75:1--75:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.75},
  URN =		{urn:nbn:de:0030-drops-164167},
  doi =		{10.4230/LIPIcs.ICALP.2022.75},
  annote =	{Keywords: Farthest Point Query, Range Aggregate Query, Minimum Enclosing Ball, Approximation, High Dimensional Space}
}

Document

DOI: 10.4230/LIPIcs.ESA.2021.38

Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning

Authors: Hu Ding

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)

Abstract

In this paper, we study several important geometric optimization problems arising in machine learning. First, we revisit the Minimum Enclosing Ball (MEB) problem in Euclidean space ℝ^d. The problem has been extensively studied before, but real-world machine learning tasks often need to handle large-scale datasets so that we cannot even afford linear time algorithms. Motivated by the recent developments on beyond worst-case analysis, we introduce the notion of stability for MEB, which is natural and easy to understand. Roughly speaking, an instance of MEB is stable, if the radius of the resulting ball cannot be significantly reduced by removing a small fraction of the input points. Under the stability assumption, we present two sampling algorithms for computing radius-approximate MEB with sample complexities independent of the number of input points n. In particular, the second algorithm has the sample complexity even independent of the dimensionality d. We also consider the general case without the stability assumption. We present a hybrid algorithm that can output either a radius-approximate MEB or a covering-approximate MEB, which improves the running time and the number of passes for the previous sublinear MEB algorithms. Further, we extend our proposed notion of stability and design sublinear time algorithms for other geometric optimization problems including MEB with outliers, polytope distance, one-class and two-class linear SVMs (without or with outliers). Our proposed algorithms also work fine for kernels.

Cite as

Hu Ding. Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 38:1-38:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{ding:LIPIcs.ESA.2021.38,
  author =	{Ding, Hu},
  title =	{{Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{38:1--38:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.38},
  URN =		{urn:nbn:de:0030-drops-146194},
  doi =		{10.4230/LIPIcs.ESA.2021.38},
  annote =	{Keywords: stability, sublinear time, geometric optimization, machine learning}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2020.5

A Unified Framework of FPT Approximation Algorithms for Clustering Problems

Authors: Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)

Abstract

In this paper, we present a framework for designing FPT approximation algorithms for many k-clustering problems. Our results are based on a new technique for reducing search spaces. A reduced search space is a small subset of the input data that has the guarantee of containing k clients close to the facilities opened in an optimal solution for any clustering problem we consider. We show, somewhat surprisingly, that greedily sampling O(k) clients yields the desired reduced search space, based on which we obtain FPT(k)-time algorithms with improved approximation guarantees for problems such as capacitated clustering, lower-bounded clustering, clustering with service installation costs, fault tolerant clustering, and priority clustering.

Cite as

Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. A Unified Framework of FPT Approximation Algorithms for Clustering Problems. In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, pp. 5:1-5:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ISAAC.2020.5,
  author =	{Feng, Qilong and Zhang, Zhen and Huang, Ziyun and Xu, Jinhui and Wang, Jianxin},
  title =	{{A Unified Framework of FPT Approximation Algorithms for Clustering Problems}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{5:1--5:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.5},
  URN =		{urn:nbn:de:0030-drops-133495},
  doi =		{10.4230/LIPIcs.ISAAC.2020.5},
  annote =	{Keywords: clustering, approximation algorithms, fixed-parameter tractability}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.26

Small Candidate Set for Translational Pattern Search

Authors: Ziyun Huang, Qilong Feng, Jianxin Wang, and Jinhui Xu

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

In this paper, we study the following pattern search problem: Given a pair of point sets A and B in fixed dimensional space R^d, with |B| = n, |A| = m and n >= m, the pattern search problem is to find the translations T’s of A such that each of the identified translations induces a matching between T(A) and a subset B' of B with cost no more than some given threshold, where the cost is defined as the minimum bipartite matching cost of T(A) and B'. We present a novel algorithm to produce a small set of candidate translations for the pattern search problem. For any B' subseteq B with |B'| = |A|, there exists at least one translation T in the candidate set such that the minimum bipartite matching cost between T(A) and B' is no larger than (1+epsilon) times the minimum bipartite matching cost between A and B' under any translation (i.e., the optimal translational matching cost). We also show that there exists an alternative solution to this problem, which constructs a candidate set of size O(n log^2 n) in O(n log^2 n) time with high probability of success. As a by-product of our construction, we obtain a weak epsilon-net for hypercube ranges, which significantly improves the construction time and the size of the candidate set. Our technique can be applied to a number of applications, including the translational pattern matching problem.

Cite as

Ziyun Huang, Qilong Feng, Jianxin Wang, and Jinhui Xu. Small Candidate Set for Translational Pattern Search. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 26:1-26:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ISAAC.2019.26,
  author =	{Huang, Ziyun and Feng, Qilong and Wang, Jianxin and Xu, Jinhui},
  title =	{{Small Candidate Set for Translational Pattern Search}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{26:1--26:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.26},
  URN =		{urn:nbn:de:0030-drops-115222},
  doi =		{10.4230/LIPIcs.ISAAC.2019.26},
  annote =	{Keywords: Bipartite matching, Alignment, Discretization, Approximate algorithm}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.61

Improved Algorithms for Clustering with Outliers

Authors: Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

Clustering is a fundamental problem in unsupervised learning. In many real-world applications, the to-be-clustered data often contains various types of noises and thus needs to be removed from the learning process. To address this issue, we consider in this paper two variants of such clustering problems, called k-median with m outliers and k-means with m outliers. Existing techniques for both problems either incur relatively large approximation ratios or can only efficiently deal with a small number of outliers. In this paper, we present improved solution to each of them for the case where k is a fixed number and m could be quite large. Particularly, we gave the first PTAS for the k-median problem with outliers in Euclidean space R^d for possibly high m and d. Our algorithm runs in O(nd((1/epsilon)(k+m))^(k/epsilon)^O(1)) time, which considerably improves the previous result (with running time O(nd(m+k)^O(m+k) + (1/epsilon)k log n)^O(1))) given by [Feldman and Schulman, SODA 2012]. For the k-means with outliers problem, we introduce a (6+epsilon)-approximation algorithm for general metric space with running time O(n(beta (1/epsilon)(k+m))^k) for some constant beta>1. Our algorithm first uses the k-means++ technique to sample O((1/epsilon)(k+m)) points from input and then select the k centers from them. Compared to the more involving existing techniques, our algorithms are much simpler, i.e., using only random sampling, and achieving better performance ratios.

Cite as

Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. Improved Algorithms for Clustering with Outliers. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 61:1-61:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.ISAAC.2019.61,
  author =	{Feng, Qilong and Zhang, Zhen and Huang, Ziyun and Xu, Jinhui and Wang, Jianxin},
  title =	{{Improved Algorithms for Clustering with Outliers}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{61:1--61:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.61},
  URN =		{urn:nbn:de:0030-drops-115573},
  doi =		{10.4230/LIPIcs.ISAAC.2019.61},
  annote =	{Keywords: Clustering with Outliers, Approximation, Random Sampling}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2017.47

An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions

Authors: Ziyun Huang and Jinhui Xu

Published in: LIPIcs, Volume 92, 28th International Symposium on Algorithms and Computation (ISAAC 2017)

Abstract

In this paper, we consider the following sum query problem: Given a point set P in R^d, and a distance-based function f(p,q) (i.e. a function of the distance between p and q) satisfying some general properties, the goal is to develop a data structure and a query algorithm for efficiently computing a (1+epsilon)-approximate solution to the sum sum_{p in P} f(p,q) for any query point q in R^d and any small constant epsilon>0. Existing techniques for this problem are mainly based on some core-set techniques which often have difficulties to deal with functions with local domination property. Based on several new insights to this problem, we develop in this paper a novel technique to overcome these encountered difficulties. Our algorithm is capable of answering queries with high success probability in time no more than ~O_{epsilon,d}(n^{0.5 + c}), and the underlying data structure can be constructed in ~O_{epsilon,d}(n^{1+c}) time for any c>0, where the hidden constant has only polynomial dependence on 1/epsilon and d. Our technique is simple and can be easily implemented for practical purpose.

Cite as

Ziyun Huang and Jinhui Xu. An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 92, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.ISAAC.2017.47,
  author =	{Huang, Ziyun and Xu, Jinhui},
  title =	{{An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions}},
  booktitle =	{28th International Symposium on Algorithms and Computation (ISAAC 2017)},
  pages =	{47:1--47:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-054-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{92},
  editor =	{Okamoto, Yoshio and Tokuyama, Takeshi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2017.47},
  URN =		{urn:nbn:de:0030-drops-82483},
  doi =		{10.4230/LIPIcs.ISAAC.2017.47},
  annote =	{Keywords: Sum Query, Distance-based Function, Local Domination, High Dimen- sions, Data Structure}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2016.54

Distributed and Robust Support Vector Machine

Authors: Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)

Abstract

In this paper, we consider the distributed version of Support Vector Machine (SVM) under the coordinator model, where all input data (i.e., points in R^d space) of SVM are arbitrarily distributed among k nodes in some network with a coordinator which can communicate with all nodes. We investigate two variants of this problem, with and without outliers. For distributed SVM without outliers, we prove a lower bound on the communication complexity and give a distributed (1-epsilon)-approximation algorithm to reach this lower bound, where epsilon is a user specified small constant. For distributed SVM with outliers, we present a (1-epsilon)-approximation algorithm to explicitly remove the influence of outliers. Our algorithm is based on a deterministic distributed top t selection algorithm with communication complexity of O(k log (t)) in the coordinator model. Experimental results on benchmark datasets confirm the theoretical guarantees of our algorithms.

Cite as

Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu. Distributed and Robust Support Vector Machine. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 54:1-54:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ISAAC.2016.54,
  author =	{Liu, Yangwei and Ding, Hu and Huang, Ziyun and Xu, Jinhui},
  title =	{{Distributed and Robust Support Vector Machine}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{54:1--54:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.54},
  URN =		{urn:nbn:de:0030-drops-68221},
  doi =		{10.4230/LIPIcs.ISAAC.2016.54},
  annote =	{Keywords: Distributed Algorithm, Communication Complexity, Robust Algorithm, SVM}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2016.34

Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance

Authors: Hu Ding, Jing Gao, and Jinhui Xu

Published in: LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)

Abstract

Truth Discovery is an important problem arising in data analytics related fields such as data mining, database, and big data. It concerns about finding the most trustworthy information from a dataset acquired from a number of unreliable sources. Due to its importance, the problem has been extensively studied in recent years and a number techniques have already been proposed. However, all of them are of heuristic nature and do not have any quality guarantee. In this paper, we formulate the problem as a high dimensional geometric optimization problem, called Entropy based Geometric Variance. Relying on a number of novel geometric techniques (such as Log-Partition and Modified Simplex Lemma), we further discover new insights to this problem. We show, for the first time, that the truth discovery problem can be solved with guaranteed quality of solution. Particularly, we show that it is possible to achieve a (1+eps)-approximation within nearly linear time under some reasonable assumptions. We expect that our algorithm will be useful for other data related applications.

Cite as

Hu Ding, Jing Gao, and Jinhui Xu. Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 34:1-34:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.SoCG.2016.34,
  author =	{Ding, Hu and Gao, Jing and Xu, Jinhui},
  title =	{{Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance}},
  booktitle =	{32nd International Symposium on Computational Geometry (SoCG 2016)},
  pages =	{34:1--34:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-009-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{51},
  editor =	{Fekete, S\'{a}ndor and Lubiw, Anna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.34},
  URN =		{urn:nbn:de:0030-drops-59264},
  doi =		{10.4230/LIPIcs.SoCG.2016.34},
  annote =	{Keywords: geometric optimization, data mining, high dimension, entropy}
}

8 Search Results for "Xu, Jinhui"

In-Range Farthest Point Queries and Related Problem in High Dimensions

Abstract

Cite as

Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning

Abstract

Cite as

A Unified Framework of FPT Approximation Algorithms for Clustering Problems

Abstract

Cite as

Small Candidate Set for Translational Pattern Search

Abstract

Cite as

Improved Algorithms for Clustering with Outliers

Abstract

Cite as

An Efficient Sum Query Algorithm for Distance-based Locally Dominating Functions

Abstract

Cite as

Distributed and Robust Support Vector Machine

Abstract

Cite as

Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance

Abstract

Cite as

Thanks for your feedback!

Could not send message