6 Search Results for "Ding, Hu"


Document
Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning

Authors: Hu Ding

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)


Abstract
In this paper, we study several important geometric optimization problems arising in machine learning. First, we revisit the Minimum Enclosing Ball (MEB) problem in Euclidean space ℝ^d. The problem has been extensively studied before, but real-world machine learning tasks often need to handle large-scale datasets so that we cannot even afford linear time algorithms. Motivated by the recent developments on beyond worst-case analysis, we introduce the notion of stability for MEB, which is natural and easy to understand. Roughly speaking, an instance of MEB is stable, if the radius of the resulting ball cannot be significantly reduced by removing a small fraction of the input points. Under the stability assumption, we present two sampling algorithms for computing radius-approximate MEB with sample complexities independent of the number of input points n. In particular, the second algorithm has the sample complexity even independent of the dimensionality d. We also consider the general case without the stability assumption. We present a hybrid algorithm that can output either a radius-approximate MEB or a covering-approximate MEB, which improves the running time and the number of passes for the previous sublinear MEB algorithms. Further, we extend our proposed notion of stability and design sublinear time algorithms for other geometric optimization problems including MEB with outliers, polytope distance, one-class and two-class linear SVMs (without or with outliers). Our proposed algorithms also work fine for kernels.

Cite as

Hu Ding. Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 38:1-38:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ding:LIPIcs.ESA.2021.38,
  author =	{Ding, Hu},
  title =	{{Stability Yields Sublinear Time Algorithms for Geometric Optimization in Machine Learning}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{38:1--38:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.38},
  URN =		{urn:nbn:de:0030-drops-146194},
  doi =		{10.4230/LIPIcs.ESA.2021.38},
  annote =	{Keywords: stability, sublinear time, geometric optimization, machine learning}
}
Document
A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions

Authors: Hu Ding

Published in: LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)


Abstract
Many real-world problems can be formulated as geometric optimization problems in high dimensions, especially in the fields of machine learning and data mining. Moreover, we often need to take into account of outliers when optimizing the objective functions. However, the presence of outliers could make the problems to be much more challenging than their vanilla versions. In this paper, we study the fundamental minimum enclosing ball (MEB) with outliers problem first; partly inspired by the core-set method from Bădoiu and Clarkson, we propose a sub-linear time bi-criteria approximation algorithm based on two novel techniques, the Uniform-Adaptive Sampling method and Sandwich Lemma. To the best of our knowledge, our result is the first sub-linear time algorithm, which has the sample size (i.e., the number of sampled points) independent of both the number of input points n and dimensionality d, for MEB with outliers in high dimensions. Furthermore, we observe that these two techniques can be generalized to deal with a broader range of geometric optimization problems with outliers in high dimensions, including flat fitting, k-center clustering, and SVM with outliers, and therefore achieve the sub-linear time algorithms for these problems respectively.

Cite as

Hu Ding. A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 38:1-38:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{ding:LIPIcs.ESA.2020.38,
  author =	{Ding, Hu},
  title =	{{A Sub-Linear Time Framework for Geometric Optimization with Outliers in High Dimensions}},
  booktitle =	{28th Annual European Symposium on Algorithms (ESA 2020)},
  pages =	{38:1--38:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-162-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{173},
  editor =	{Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.38},
  URN =		{urn:nbn:de:0030-drops-129045},
  doi =		{10.4230/LIPIcs.ESA.2020.38},
  annote =	{Keywords: minimum enclosing ball, outliers, shape fitting, high dimensions, sub-linear time}
}
Document
Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

Authors: Hu Ding, Haikuo Yu, and Zixiu Wang

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
We study the problem of k-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez’s algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center clustering with outliers efficiently, in terms of clustering quality and time complexity. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Our algorithms are easy to implement in practice. We test our method on both synthetic and real datasets. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower running times comparing with existing methods.

Cite as

Hu Ding, Haikuo Yu, and Zixiu Wang. Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 40:1-40:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.ESA.2019.40,
  author =	{Ding, Hu and Yu, Haikuo and Wang, Zixiu},
  title =	{{Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{40:1--40:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.40},
  URN =		{urn:nbn:de:0030-drops-111613},
  doi =		{10.4230/LIPIcs.ESA.2019.40},
  annote =	{Keywords: k-center clustering, outliers, coreset, doubling metrics, random sampling}
}
Document
On Geometric Prototype and Applications

Authors: Hu Ding and Manni Liu

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)


Abstract
In this paper, we propose to study a new geometric optimization problem called the "geometric prototype" in Euclidean space. Given a set of patterns, where each pattern is represented by a (weighted or unweighted) point set, the geometric prototype can be viewed as the "average pattern" minimizing the total matching cost to them. As a general model, the problem finds many applications in real-world, such as Wasserstein barycenter and ensemble clustering. The dimensionality could be either constant or high, depending on the applications. To our best knowledge, the general geometric prototype problem has yet to be seriously considered by the theory community. To bridge the gap between theory and practice, we first show that a small core-set can be obtained to substantially reduce the data size. Consequently, any existing heuristic or algorithm can run on the core-set to achieve a great improvement on the efficiency. As a new application of core-set, it needs to tackle a couple of challenges particularly in theory. Finally, we test our method on both image and high dimensional clustering datasets; the experimental results remain stable even if we run the algorithms on core-sets much smaller than the original datasets, while the running times are reduced significantly.

Cite as

Hu Ding and Manni Liu. On Geometric Prototype and Applications. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 23:1-23:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.ESA.2018.23,
  author =	{Ding, Hu and Liu, Manni},
  title =	{{On Geometric Prototype and Applications}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{23:1--23:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.23},
  URN =		{urn:nbn:de:0030-drops-94867},
  doi =		{10.4230/LIPIcs.ESA.2018.23},
  annote =	{Keywords: prototype, core-set, Wasserstein barycenter, ensemble clustering}
}
Document
Distributed and Robust Support Vector Machine

Authors: Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)


Abstract
In this paper, we consider the distributed version of Support Vector Machine (SVM) under the coordinator model, where all input data (i.e., points in R^d space) of SVM are arbitrarily distributed among k nodes in some network with a coordinator which can communicate with all nodes. We investigate two variants of this problem, with and without outliers. For distributed SVM without outliers, we prove a lower bound on the communication complexity and give a distributed (1-epsilon)-approximation algorithm to reach this lower bound, where epsilon is a user specified small constant. For distributed SVM with outliers, we present a (1-epsilon)-approximation algorithm to explicitly remove the influence of outliers. Our algorithm is based on a deterministic distributed top t selection algorithm with communication complexity of O(k log (t)) in the coordinator model. Experimental results on benchmark datasets confirm the theoretical guarantees of our algorithms.

Cite as

Yangwei Liu, Hu Ding, Ziyun Huang, and Jinhui Xu. Distributed and Robust Support Vector Machine. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 54:1-54:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ISAAC.2016.54,
  author =	{Liu, Yangwei and Ding, Hu and Huang, Ziyun and Xu, Jinhui},
  title =	{{Distributed and Robust Support Vector Machine}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{54:1--54:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.54},
  URN =		{urn:nbn:de:0030-drops-68221},
  doi =		{10.4230/LIPIcs.ISAAC.2016.54},
  annote =	{Keywords: Distributed Algorithm, Communication Complexity, Robust Algorithm, SVM}
}
Document
Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance

Authors: Hu Ding, Jing Gao, and Jinhui Xu

Published in: LIPIcs, Volume 51, 32nd International Symposium on Computational Geometry (SoCG 2016)


Abstract
Truth Discovery is an important problem arising in data analytics related fields such as data mining, database, and big data. It concerns about finding the most trustworthy information from a dataset acquired from a number of unreliable sources. Due to its importance, the problem has been extensively studied in recent years and a number techniques have already been proposed. However, all of them are of heuristic nature and do not have any quality guarantee. In this paper, we formulate the problem as a high dimensional geometric optimization problem, called Entropy based Geometric Variance. Relying on a number of novel geometric techniques (such as Log-Partition and Modified Simplex Lemma), we further discover new insights to this problem. We show, for the first time, that the truth discovery problem can be solved with guaranteed quality of solution. Particularly, we show that it is possible to achieve a (1+eps)-approximation within nearly linear time under some reasonable assumptions. We expect that our algorithm will be useful for other data related applications.

Cite as

Hu Ding, Jing Gao, and Jinhui Xu. Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance. In 32nd International Symposium on Computational Geometry (SoCG 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 51, pp. 34:1-34:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{ding_et_al:LIPIcs.SoCG.2016.34,
  author =	{Ding, Hu and Gao, Jing and Xu, Jinhui},
  title =	{{Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance}},
  booktitle =	{32nd International Symposium on Computational Geometry (SoCG 2016)},
  pages =	{34:1--34:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-009-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{51},
  editor =	{Fekete, S\'{a}ndor and Lubiw, Anna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2016.34},
  URN =		{urn:nbn:de:0030-drops-59264},
  doi =		{10.4230/LIPIcs.SoCG.2016.34},
  annote =	{Keywords: geometric optimization, data mining, high dimension, entropy}
}
  • Refine by Author
  • 6 Ding, Hu
  • 2 Xu, Jinhui
  • 1 Gao, Jing
  • 1 Huang, Ziyun
  • 1 Liu, Manni
  • Show More...

  • Refine by Classification
  • 3 Theory of computation → Approximation algorithms analysis
  • 1 Theory of computation → Computational geometry

  • Refine by Keyword
  • 2 geometric optimization
  • 2 outliers
  • 1 Communication Complexity
  • 1 Distributed Algorithm
  • 1 Robust Algorithm
  • Show More...

  • Refine by Type
  • 6 document

  • Refine by Publication Year
  • 2 2016
  • 1 2018
  • 1 2019
  • 1 2020
  • 1 2021

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail