Search Results

Documents authored by Tai, Wai Ming


Document
Optimal Coreset for Gaussian Kernel Density Estimation

Authors: Wai Ming Tai

Published in: LIPIcs, Volume 224, 38th International Symposium on Computational Geometry (SoCG 2022)


Abstract
Given a point set P ⊂ ℝ^d, the kernel density estimate of P is defined as 𝒢-_P(x) = 1/|P| ∑_{p ∈ P}e^{-∥x-p∥²} for any x ∈ ℝ^d. We study how to construct a small subset Q of P such that the kernel density estimate of P is approximated by the kernel density estimate of Q. This subset Q is called a coreset. The main technique in this work is constructing a ± 1 coloring on the point set P by discrepancy theory and we leverage Banaszczyk’s Theorem. When d > 1 is a constant, our construction gives a coreset of size O(1/ε) as opposed to the best-known result of O(1/ε √{log 1/ε}). It is the first result to give a breakthrough on the barrier of √log factor even when d = 2.

Cite as

Wai Ming Tai. Optimal Coreset for Gaussian Kernel Density Estimation. In 38th International Symposium on Computational Geometry (SoCG 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 224, pp. 63:1-63:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{tai:LIPIcs.SoCG.2022.63,
  author =	{Tai, Wai Ming},
  title =	{{Optimal Coreset for Gaussian Kernel Density Estimation}},
  booktitle =	{38th International Symposium on Computational Geometry (SoCG 2022)},
  pages =	{63:1--63:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-227-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{224},
  editor =	{Goaoc, Xavier and Kerber, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2022.63},
  URN =		{urn:nbn:de:0030-drops-160719},
  doi =		{10.4230/LIPIcs.SoCG.2022.63},
  annote =	{Keywords: Discrepancy Theory, Kernel Density Estimation, Coreset}
}
Document
Finding an Approximate Mode of a Kernel Density Estimate

Authors: Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)


Abstract
Given points P = {p₁,...,p_n} subset of ℝ^d, how do we find a point x which approximately maximizes the function 1/n ∑_{p_i ∈ P} e^{-‖p_i-x‖²}? In other words, how do we find an approximate mode of a Gaussian kernel density estimate (KDE) of P? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. We provide fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low (constant) dimension, our main contribution is a reduction to solving systems of polynomial inequalities. For high dimension, we prove the first dimensionality reduction result for KDE mode finding. The latter result leverages Johnson-Lindenstrauss projection, Kirszbraun’s classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding. For constant approximation factor these algorithms run in O(n (log n)^{O(d)}) and O(nd + (log n)^{O(log³ n)}), respectively; these are proven more precisely as a (1+ε)-approximation guarantee. Furthermore, for the special case of d = 2, we give a combinatorial algorithm running in O(n log² n) time. We empirically demonstrate that the random projection approach and the 2-dimensional algorithm improves over the state-of-the-art mode-finding heuristics.

Cite as

Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai. Finding an Approximate Mode of a Kernel Density Estimate. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ESA.2021.61,
  author =	{Lee, Jasper C.H. and Li, Jerry and Musco, Christopher and Phillips, Jeff M. and Tai, Wai Ming},
  title =	{{Finding an Approximate Mode of a Kernel Density Estimate}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{61:1--61:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.61},
  URN =		{urn:nbn:de:0030-drops-146428},
  doi =		{10.4230/LIPIcs.ESA.2021.61},
  annote =	{Keywords: Kernel density estimation, Dimensionality reduction, Coresets, Means-shift}
}
Document
RANDOM
The GaussianSketch for Almost Relative Error Kernel Distance

Authors: Jeff M. Phillips and Wai Ming Tai

Published in: LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)


Abstract
We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+ε)-relative error, but with a small additive α term. In the first variants the dependence on 1/α is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/α is still poly-logarithmic, but the dependence on d is linear.

Cite as

Jeff M. Phillips and Wai Ming Tai. The GaussianSketch for Almost Relative Error Kernel Distance. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{phillips_et_al:LIPIcs.APPROX/RANDOM.2020.12,
  author =	{Phillips, Jeff M. and Tai, Wai Ming},
  title =	{{The GaussianSketch for Almost Relative Error Kernel Distance}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-164-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{176},
  editor =	{Byrka, Jaros{\l}aw and Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.12},
  URN =		{urn:nbn:de:0030-drops-126150},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2020.12},
  annote =	{Keywords: Kernel Distance, Kernel Density Estimation, Sketching}
}
Document
Near-Optimal Coresets of Kernel Density Estimates

Authors: Jeff M. Phillips and Wai Ming Tai

Published in: LIPIcs, Volume 99, 34th International Symposium on Computational Geometry (SoCG 2018)


Abstract
We construct near-optimal coresets for kernel density estimate for points in R^d when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size O(sqrt{d log (1/epsilon)}/epsilon), and we show a near-matching lower bound of size Omega(sqrt{d}/epsilon). The upper bound is a polynomial in 1/epsilon improvement when d in [3,1/epsilon^2) (for all kernels except the Gaussian kernel which had a previous upper bound of O((1/epsilon) log^d (1/epsilon))) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

Cite as

Jeff M. Phillips and Wai Ming Tai. Near-Optimal Coresets of Kernel Density Estimates. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 66:1-66:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{phillips_et_al:LIPIcs.SoCG.2018.66,
  author =	{Phillips, Jeff M. and Tai, Wai Ming},
  title =	{{Near-Optimal Coresets of Kernel Density Estimates}},
  booktitle =	{34th International Symposium on Computational Geometry (SoCG 2018)},
  pages =	{66:1--66:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-066-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{99},
  editor =	{Speckmann, Bettina and T\'{o}th, Csaba D.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2018.66},
  URN =		{urn:nbn:de:0030-drops-87797},
  doi =		{10.4230/LIPIcs.SoCG.2018.66},
  annote =	{Keywords: Coresets, Kernel Density Estimate, Discrepancy}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail