Document

# Finding an Approximate Mode of a Kernel Density Estimate

## File

LIPIcs.ESA.2021.61.pdf
• Filesize: 0.77 MB
• 19 pages

## Cite As

Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai. Finding an Approximate Mode of a Kernel Density Estimate. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.61

## Abstract

Given points P = {p₁,...,p_n} subset of ℝ^d, how do we find a point x which approximately maximizes the function 1/n ∑_{p_i ∈ P} e^{-‖p_i-x‖²}? In other words, how do we find an approximate mode of a Gaussian kernel density estimate (KDE) of P? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. We provide fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low (constant) dimension, our main contribution is a reduction to solving systems of polynomial inequalities. For high dimension, we prove the first dimensionality reduction result for KDE mode finding. The latter result leverages Johnson-Lindenstrauss projection, Kirszbraun’s classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding. For constant approximation factor these algorithms run in O(n (log n)^{O(d)}) and O(nd + (log n)^{O(log³ n)}), respectively; these are proven more precisely as a (1+ε)-approximation guarantee. Furthermore, for the special case of d = 2, we give a combinatorial algorithm running in O(n log² n) time. We empirically demonstrate that the random projection approach and the 2-dimensional algorithm improves over the state-of-the-art mode-finding heuristics.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Design and analysis of algorithms
##### Keywords
• Kernel density estimation
• Dimensionality reduction
• Coresets
• Means-shift

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003.
2. Pankaj K Agarwal, Haim Kaplan, and Micha Sharir. Union of random minkowski sums and network vulnerability analysis. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 177-186. ACM, 2013.
3. Nir Ailon and Bernard Chazelle. The fast johnson-lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., pages 302-322, 2009.
4. Nir Ailon and Edo Liberty. An almost optimal unrestricted fast johnson-lindenstrauss transform. ACM Trans. Algorithms, 9(3):21:1-21:12, 2013.
5. Carlos Améndola, Alexander Engström, and Christian Haase. Maximum number of modes of gaussian mixtures. Information and Inference: A Journal of the IMA, 9(3):587-600, 2020.
6. Arturs Backurs, Moses Charikar, Piotr Indyk, and Paris Siminelakis. Efficient density evaluation for smooth kernels. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 615-626. IEEE, 2018.
7. Arturs Backurs, Piotr Indyk, and Tal Wagner. Space and time efficient kernel density estimation in high dimensions. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 15799-15808, 2019.
8. Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1039-1050, 2019.
9. Miguel Á. Carreira-Perpiñán. Mode-finding for mixtures of gaussian distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1318-1323, 2000.
10. Miguel Á. Carreira-Perpiñán. Gaussian mean-shift is an em algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):767-776, 2007.
11. Miguel A Carreira-Perpinán and Christopher KI Williams. On the number of modes of a gaussian mixture. In International Conference on Scale-Space Theories in Computer Vision, pages 625-640. Springer, 2003.
12. Timothy M Chan. Klee’s measure problem made easy. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2013.
13. Cheng Chang and R. Ansari. Kernel particle filter for visual tracking. IEEE Signal Processing Letters, 12:242-245, 2005.
14. Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Robust topolical inference: Distance-to-a-measure and kernel distance. Technical report, arXiv:1412.7197, 2014.
15. Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 163-172, 2015.
16. Michael B. Cohen, T.S. Jayram, and Jelani Nelson. Simple analyses of the sparse johnson-lindenstrauss transform. In The 1st Symposium on Simplicity in Algorithms, pages 15:1-15:9, 2018.
17. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60-65, 2003.
18. Luc Devroye and László Györfi. Nonparametric Density Estimation: The L₁ View. Wiley, 1984.
19. Luc Devroye and Gábor Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, 2001.
20. Herbert Edelsbrunner, Brittany Terese Fasy, and Günter Rote. Add isotropic Gaussian kernels at own risk: More and more resiliant modes in higher dimensions. Proceedings 28th Annual Symposium on Computational Geometry, pages 91-100, 2012.
21. Kenji Fukumizu, Le Song, and Arthur Gretton. Kernel bayes' rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 2013.
22. Theo Gasser, Peter Hall, and Brettt Presnell. Nonparametric estimation of the mode of a distribution of random curves. Journal of the Royal Statistical Society: Series B, 60:681-691, 1997.
23. Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
24. Mingxuan Han, Michael Matheny, and Jeff M. Phillips. The kernel spatial scan statistic. In ACM International Conference on Advances in Geographic Information Systems, 2019.
25. Sariel Har-Peled and Micha Sharir. Relative (p, ε)-approximations in geometry. Discrete & Computational Geometry, 45(3):462-496, 2011.
26. Chi Jin, Yuchen Zhang, Sivaraman Balakrishnan, Martin J. Wainwright, and Michael Jordan. Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences. In NeurIPS, 2016.
27. Geoge H. John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 1995.
28. Sarang Joshi, Raj Varma Kommaraji, Jeff M Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In Proceedings of the twenty-seventh annual symposium on Computational geometry, pages 47-56. ACM, 2011.
29. Daniel M. Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014.
30. M. Kirszbraun. Über die zusammenziehende und lipschitzsche transformationen. Fundamenta Mathematicae, 22(1):77-108, 1934.
31. Felix Krahmer and Rachel Ward. New and improved johnson-lindenstrauss embeddings via the restricted isometry property. SIAM Journal on Mathematical Analysis, 43(3):1269-1281, 2011.
32. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. Journal of Computer and System Science, 62:516-527, 2001.
33. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
34. David Lopaz-Paz, Krikamol Muandet, Bernhard Schölkopf, and Ilya Tolstikhin. Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, 2015.
35. Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1027-1038, 2019.
36. Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10:1-141, 2017.
37. Shyam Narayanan and Jelani Nelson. Optimal terminal dimensionality reduction in euclidean space. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 1064-1069, 2019.
38. Jeff M Phillips and Wai Ming Tai. Improved coresets for kernel density estimates. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2718-2727. SIAM, 2018.
39. Jeff M Phillips and Wai Ming Tai. Near-optimal coresets of kernel density estimates. In 34th International Symposium on Computational Geometry (SoCG 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
40. Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates. In International Symposium on Computational Geometry, 2015.
41. Kent Quanrud. Spectral sparsification of metrics and kernels. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1445-1464. SIAM, 2021.
42. James Renegar. On the computational complexity of approximating solutions for real algebraic formulae. SIAM Journal on Computing, 21(6):1008-1025, 1992.
43. Alessandro Rinaldo, Larry Wasserman, et al. Generalized density clustering. The Annals of Statistics, 38(5):2678-2722, 2010.
44. Bernhard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
45. David W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.
46. Chunhua Shen, Michael J. Brooks, and Anton van den Hengel. Fast global kernel density mode seeking: Applications to localization and tracking. IEEE Transactions on Image Processing, 16:1457-1469, 2007.
47. Bernard W. Silverman. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B, 43:97-99, 1981.
48. Bernard W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.
49. Alex J. Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A Hilbert space embedding for distributions. In Proceedings of Algorithmic Learning Theory, 2007.
50. Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517-1561, 2010.
51. F. A. Valentine. A lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1):83-93, 1945.
52. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theo. of Prob and App, 16:264-280, 1971.
53. Grace Wahba. Support vector machines, reproducing kernel Hilbert spaces, and randomization GACV. In Advances in Kernel Methods - Support Vector Learning, pages 69-88. Bernhard Schölkopf and Alezander J. Smola and Christopher J. C. Burges and Rosanna Soentpiet, 1999.
54. Yan Zheng and Jeff M Phillips. L_infty error and bandwidth selection for kernel density estimates of large data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1533-1542. ACM, 2015.
55. Yan Zheng and Jeff M Phillips. Coresets for kernel regression. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 645-654. ACM, 2017.
56. Shaofeng Zou, Yingbin Liang, H Vincent Poor, and Xinghua Shi. Unsupervised nonparametric anomaly detection: A kernel method. In 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 836-841. IEEE, 2014.
X

Feedback for Dagstuhl Publishing