Near-Optimal Coresets of Kernel Density Estimates

Phillips, Jeff M.; Tai, Wai Ming

doi:10.4230/LIPIcs.SoCG.2018.66

File

Cite AsGet BibTex

Jeff M. Phillips and Wai Ming Tai. Near-Optimal Coresets of Kernel Density Estimates. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 66:1-66:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.SoCG.2018.66

Abstract

We construct near-optimal coresets for kernel density estimate for points in R^d when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size O(sqrt{d log (1/epsilon)}/epsilon), and we show a near-matching lower bound of size Omega(sqrt{d}/epsilon). The upper bound is a polynomial in 1/epsilon improvement when d in [3,1/epsilon^2) (for all kernels except the Gaussian kernel which had a previous upper bound of O((1/epsilon) log^d (1/epsilon))) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Ery Arias-Castro, David Mason, and Bruno Pelletier. On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Journal of Machine Learning Research, 17(43):1-28, 2016.
Francis Bach, Simon Lacoste-Julien, and Guillaume Obozinski. On the equivalence between herding and conditional gradient algorithms. In ICML 2012 International Conference on Machine Learning, 2012.
Wojciech Banaszczyk. Balancing vectors and gaussian measures of n-dimensional convex bodies. Random Structures &Algorithms, 12(4):351-360, 1998.
Nikhil Bansal, Daniel Dadush, Shashwat Garg, and Shachar Lovett. The Gram-Schmidt walk: A cure for the Banaszczyk blues (to appear). Proceedings of the fiftieth annual ACM symposium on Theory of computing, 2018.
Jon Louis Bentley and James B. Saxe. Decomposable searching problems I: Static-to-dynamic transformations. Journal of Algorithms, 1(4), 1980.
Omer Bobrowski, Sayan Mukherjee, and Jonathan E. Taylor. Topological consistency via kernel estimation. Bernoulli, 23:288-328, 2017.
Bernard Chazelle. The Discrepancy Method. Cambridge, 2000.
Bernard Chazelle and Jiri Matousek. On linear-time deterministic algorithms for optimization problems in fixed dimensions. J. Algorithms, 21:579-597, 1996.
Luc Devroye and László Györfi. Nonparametric Density Estimation: The L₁ View. Wiley, 1984.
Petros Drineas and Michael W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. JLMR, 6:2153-2175, 2005.
Jianqing Fan and Irene Gijbels. Local polynomial modelling and its applications: monographs on statistics and applied probability 66, volume 66. CRC Press, 1996.
Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. Confidence sets for persistence diagrams. The Annals of Statistics, 42:2301-2339, 2014.
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
Matthias Hein and Olivier Bousquet. Hilbertian metrics and positive definite kernels on probability measures. In AISTATS, pages 136-143, 2005.
Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. A review of kernel methods in machine learning. Technical Report 156, Max Planck Institute for Biological Cybernetics, 2006.
Sarang Joshi, Raj Varma Kommaraji, Jeff M Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In Proceedings of the twenty-seventh annual symposium on Computational geometry, pages 47-56. ACM, 2011.
Jiri Matousek. Geometric Discrepancy; An Illustrated Guide, 2nd printing. Springer-Verlag, 2010.
Jiri Matousek, Aleksandar Nikolov, and Kunal Talwar. Factorization norms and hereditary discrepancy. arXiv preprint arXiv:1408.1376, 2014.
Jeff M. Phillips. Algorithms for ε-approximations of terrains. In ICALP, 2008.
Jeff M Phillips. ε-samples for kernels. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1622-1632. SIAM, 2013.
Jeff M Phillips and Wai Ming Tai. Improved coresets for kernel density estimates. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2718-2727. SIAM, 2018.
Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates. In SOCG, 2015.
Alessandro Rinaldo and Larry Wasserman. Generalized density clustering. The Annals of Statistics, pages 2678-2722, 2010.
Isaac J Schoenberg. Metric spaces and completely monotone functions. Annals of Mathematics, pages 811-841, 1938.
Bernhard Scholkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. Generalized outlier detection with flexible kernel density estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 542-550. SIAM, 2014.
David W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.
Bernard W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman &Hall/CRC, 1986.
Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. JMLR, 11:1517-1561, 2010.
Yan Zheng and Jeff M. Phillips. l_∞ error and bandwidth selection for kernel density estimates of large data. In KDD, 2015.

Near-Optimal Coresets of Kernel Density Estimates

Authors Jeff M. Phillips, Wai Ming Tai

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message