2 Search Results for "Lee, Jasper C.H."


Document
Optimal Sub-Gaussian Mean Estimation in Very High Dimensions

Authors: Jasper C.H. Lee and Paul Valiant

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)


Abstract
We address the problem of mean estimation in very high dimensions, in the high probability regime parameterized by failure probability δ. For a distribution with covariance Σ, let its "effective dimension" be d_eff = {Tr(Σ)}/{λ_{max}(Σ)}. For the regime where d_eff = ω(log^2 (1/δ)), we show the first algorithm whose sample complexity is optimal to within 1+o(1) factor. The algorithm has a surprisingly simple structure: 1) re-center the samples using a known sub-Gaussian estimator, 2) carefully choose an easy-to-compute positive integer t and then remove the t samples farthest from the origin and 3) return the sample mean of the remaining samples. The core of the analysis relies on a novel vector Bernstein-type tail bound, showing that under general conditions, the sample mean of a bounded high-dimensional distribution is highly concentrated around a spherical shell.

Cite as

Jasper C.H. Lee and Paul Valiant. Optimal Sub-Gaussian Mean Estimation in Very High Dimensions. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 98:1-98:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ITCS.2022.98,
  author =	{Lee, Jasper C.H. and Valiant, Paul},
  title =	{{Optimal Sub-Gaussian Mean Estimation in Very High Dimensions}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{98:1--98:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.98},
  URN =		{urn:nbn:de:0030-drops-156942},
  doi =		{10.4230/LIPIcs.ITCS.2022.98},
  annote =	{Keywords: High-dimensional mean estimation}
}
Document
Finding an Approximate Mode of a Kernel Density Estimate

Authors: Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)


Abstract
Given points P = {p₁,...,p_n} subset of ℝ^d, how do we find a point x which approximately maximizes the function 1/n ∑_{p_i ∈ P} e^{-‖p_i-x‖²}? In other words, how do we find an approximate mode of a Gaussian kernel density estimate (KDE) of P? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. We provide fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low (constant) dimension, our main contribution is a reduction to solving systems of polynomial inequalities. For high dimension, we prove the first dimensionality reduction result for KDE mode finding. The latter result leverages Johnson-Lindenstrauss projection, Kirszbraun’s classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding. For constant approximation factor these algorithms run in O(n (log n)^{O(d)}) and O(nd + (log n)^{O(log³ n)}), respectively; these are proven more precisely as a (1+ε)-approximation guarantee. Furthermore, for the special case of d = 2, we give a combinatorial algorithm running in O(n log² n) time. We empirically demonstrate that the random projection approach and the 2-dimensional algorithm improves over the state-of-the-art mode-finding heuristics.

Cite as

Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai. Finding an Approximate Mode of a Kernel Density Estimate. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ESA.2021.61,
  author =	{Lee, Jasper C.H. and Li, Jerry and Musco, Christopher and Phillips, Jeff M. and Tai, Wai Ming},
  title =	{{Finding an Approximate Mode of a Kernel Density Estimate}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{61:1--61:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.61},
  URN =		{urn:nbn:de:0030-drops-146428},
  doi =		{10.4230/LIPIcs.ESA.2021.61},
  annote =	{Keywords: Kernel density estimation, Dimensionality reduction, Coresets, Means-shift}
}
  • Refine by Author
  • 2 Lee, Jasper C.H.
  • 1 Li, Jerry
  • 1 Musco, Christopher
  • 1 Phillips, Jeff M.
  • 1 Tai, Wai Ming
  • Show More...

  • Refine by Classification
  • 1 Mathematics of computing → Multivariate statistics
  • 1 Mathematics of computing → Nonparametric statistics
  • 1 Theory of computation → Design and analysis of algorithms
  • 1 Theory of computation → Sample complexity and generalization bounds
  • 1 Theory of computation → Streaming, sublinear and near linear time algorithms

  • Refine by Keyword
  • 1 Coresets
  • 1 Dimensionality reduction
  • 1 High-dimensional mean estimation
  • 1 Kernel density estimation
  • 1 Means-shift

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 2021
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail