Finding an Approximate Mode of a Kernel Density Estimate

Authors Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, Wai Ming Tai

Thumbnail PDF


  • Filesize: 0.77 MB
  • 19 pages

Document Identifiers

Author Details

Jasper C.H. Lee
  • Brown University, Providence, RI, USA
Jerry Li
  • Microsoft Research, Redmond, WA, USA
Christopher Musco
  • New York University, NY, USA
Jeff M. Phillips
  • University of Utah, Salt Lake City, UT, USA
Wai Ming Tai
  • University of Chicago, IL, USA

Cite AsGet BibTex

Jasper C.H. Lee, Jerry Li, Christopher Musco, Jeff M. Phillips, and Wai Ming Tai. Finding an Approximate Mode of a Kernel Density Estimate. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Given points P = {p₁,...,p_n} subset of ℝ^d, how do we find a point x which approximately maximizes the function 1/n ∑_{p_i ∈ P} e^{-‖p_i-x‖²}? In other words, how do we find an approximate mode of a Gaussian kernel density estimate (KDE) of P? Given the power of KDEs in representing probability distributions and other continuous functions, the basic mode finding problem is widely applicable. However, it is poorly understood algorithmically. We provide fast and provably accurate approximation algorithms for mode finding in both the low and high dimensional settings. For low (constant) dimension, our main contribution is a reduction to solving systems of polynomial inequalities. For high dimension, we prove the first dimensionality reduction result for KDE mode finding. The latter result leverages Johnson-Lindenstrauss projection, Kirszbraun’s classic extension theorem, and perhaps surprisingly, the mean-shift heuristic for mode finding. For constant approximation factor these algorithms run in O(n (log n)^{O(d)}) and O(nd + (log n)^{O(log³ n)}), respectively; these are proven more precisely as a (1+ε)-approximation guarantee. Furthermore, for the special case of d = 2, we give a combinatorial algorithm running in O(n log² n) time. We empirically demonstrate that the random projection approach and the 2-dimensional algorithm improves over the state-of-the-art mode-finding heuristics.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Kernel density estimation
  • Dimensionality reduction
  • Coresets
  • Means-shift


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. Google Scholar
  2. Pankaj K Agarwal, Haim Kaplan, and Micha Sharir. Union of random minkowski sums and network vulnerability analysis. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 177-186. ACM, 2013. Google Scholar
  3. Nir Ailon and Bernard Chazelle. The fast johnson-lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., pages 302-322, 2009. Google Scholar
  4. Nir Ailon and Edo Liberty. An almost optimal unrestricted fast johnson-lindenstrauss transform. ACM Trans. Algorithms, 9(3):21:1-21:12, 2013. Google Scholar
  5. Carlos Améndola, Alexander Engström, and Christian Haase. Maximum number of modes of gaussian mixtures. Information and Inference: A Journal of the IMA, 9(3):587-600, 2020. Google Scholar
  6. Arturs Backurs, Moses Charikar, Piotr Indyk, and Paris Siminelakis. Efficient density evaluation for smooth kernels. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 615-626. IEEE, 2018. Google Scholar
  7. Arturs Backurs, Piotr Indyk, and Tal Wagner. Space and time efficient kernel density estimation in high dimensions. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 15799-15808, 2019. Google Scholar
  8. Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1039-1050, 2019. Google Scholar
  9. Miguel Á. Carreira-Perpiñán. Mode-finding for mixtures of gaussian distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1318-1323, 2000. Google Scholar
  10. Miguel Á. Carreira-Perpiñán. Gaussian mean-shift is an em algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5):767-776, 2007. Google Scholar
  11. Miguel A Carreira-Perpinán and Christopher KI Williams. On the number of modes of a gaussian mixture. In International Conference on Scale-Space Theories in Computer Vision, pages 625-640. Springer, 2003. Google Scholar
  12. Timothy M Chan. Klee’s measure problem made easy. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 410-419. IEEE, 2013. Google Scholar
  13. Cheng Chang and R. Ansari. Kernel particle filter for visual tracking. IEEE Signal Processing Letters, 12:242-245, 2005. Google Scholar
  14. Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Robust topolical inference: Distance-to-a-measure and kernel distance. Technical report, arXiv:1412.7197, 2014. Google Scholar
  15. Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 163-172, 2015. Google Scholar
  16. Michael B. Cohen, T.S. Jayram, and Jelani Nelson. Simple analyses of the sparse johnson-lindenstrauss transform. In The 1st Symposium on Simplicity in Algorithms, pages 15:1-15:9, 2018. Google Scholar
  17. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60-65, 2003. Google Scholar
  18. Luc Devroye and László Györfi. Nonparametric Density Estimation: The L₁ View. Wiley, 1984. Google Scholar
  19. Luc Devroye and Gábor Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, 2001. Google Scholar
  20. Herbert Edelsbrunner, Brittany Terese Fasy, and Günter Rote. Add isotropic Gaussian kernels at own risk: More and more resiliant modes in higher dimensions. Proceedings 28th Annual Symposium on Computational Geometry, pages 91-100, 2012. Google Scholar
  21. Kenji Fukumizu, Le Song, and Arthur Gretton. Kernel bayes' rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 2013. Google Scholar
  22. Theo Gasser, Peter Hall, and Brettt Presnell. Nonparametric estimation of the mode of a distribution of random curves. Journal of the Royal Statistical Society: Series B, 60:681-691, 1997. Google Scholar
  23. Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012. Google Scholar
  24. Mingxuan Han, Michael Matheny, and Jeff M. Phillips. The kernel spatial scan statistic. In ACM International Conference on Advances in Geographic Information Systems, 2019. Google Scholar
  25. Sariel Har-Peled and Micha Sharir. Relative (p, ε)-approximations in geometry. Discrete & Computational Geometry, 45(3):462-496, 2011. Google Scholar
  26. Chi Jin, Yuchen Zhang, Sivaraman Balakrishnan, Martin J. Wainwright, and Michael Jordan. Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences. In NeurIPS, 2016. Google Scholar
  27. Geoge H. John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 1995. Google Scholar
  28. Sarang Joshi, Raj Varma Kommaraji, Jeff M Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In Proceedings of the twenty-seventh annual symposium on Computational geometry, pages 47-56. ACM, 2011. Google Scholar
  29. Daniel M. Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014. Google Scholar
  30. M. Kirszbraun. Über die zusammenziehende und lipschitzsche transformationen. Fundamenta Mathematicae, 22(1):77-108, 1934. Google Scholar
  31. Felix Krahmer and Rachel Ward. New and improved johnson-lindenstrauss embeddings via the restricted isometry property. SIAM Journal on Mathematical Analysis, 43(3):1269-1281, 2011. Google Scholar
  32. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. Journal of Computer and System Science, 62:516-527, 2001. Google Scholar
  33. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015. Google Scholar
  34. David Lopaz-Paz, Krikamol Muandet, Bernhard Schölkopf, and Ilya Tolstikhin. Towards a learning theory of cause-effect inference. In International Conference on Machine Learning, 2015. Google Scholar
  35. Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1027-1038, 2019. Google Scholar
  36. Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10:1-141, 2017. Google Scholar
  37. Shyam Narayanan and Jelani Nelson. Optimal terminal dimensionality reduction in euclidean space. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 1064-1069, 2019. Google Scholar
  38. Jeff M Phillips and Wai Ming Tai. Improved coresets for kernel density estimates. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2718-2727. SIAM, 2018. Google Scholar
  39. Jeff M Phillips and Wai Ming Tai. Near-optimal coresets of kernel density estimates. In 34th International Symposium on Computational Geometry (SoCG 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  40. Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates. In International Symposium on Computational Geometry, 2015. Google Scholar
  41. Kent Quanrud. Spectral sparsification of metrics and kernels. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1445-1464. SIAM, 2021. Google Scholar
  42. James Renegar. On the computational complexity of approximating solutions for real algebraic formulae. SIAM Journal on Computing, 21(6):1008-1025, 1992. Google Scholar
  43. Alessandro Rinaldo, Larry Wasserman, et al. Generalized density clustering. The Annals of Statistics, 38(5):2678-2722, 2010. Google Scholar
  44. Bernhard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002. Google Scholar
  45. David W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992. Google Scholar
  46. Chunhua Shen, Michael J. Brooks, and Anton van den Hengel. Fast global kernel density mode seeking: Applications to localization and tracking. IEEE Transactions on Image Processing, 16:1457-1469, 2007. Google Scholar
  47. Bernard W. Silverman. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B, 43:97-99, 1981. Google Scholar
  48. Bernard W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986. Google Scholar
  49. Alex J. Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A Hilbert space embedding for distributions. In Proceedings of Algorithmic Learning Theory, 2007. Google Scholar
  50. Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517-1561, 2010. Google Scholar
  51. F. A. Valentine. A lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1):83-93, 1945. Google Scholar
  52. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theo. of Prob and App, 16:264-280, 1971. Google Scholar
  53. Grace Wahba. Support vector machines, reproducing kernel Hilbert spaces, and randomization GACV. In Advances in Kernel Methods - Support Vector Learning, pages 69-88. Bernhard Schölkopf and Alezander J. Smola and Christopher J. C. Burges and Rosanna Soentpiet, 1999. Google Scholar
  54. Yan Zheng and Jeff M Phillips. L_infty error and bandwidth selection for kernel density estimates of large data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1533-1542. ACM, 2015. Google Scholar
  55. Yan Zheng and Jeff M Phillips. Coresets for kernel regression. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 645-654. ACM, 2017. Google Scholar
  56. Shaofeng Zou, Yingbin Liang, H Vincent Poor, and Xinghua Shi. Unsupervised nonparametric anomaly detection: A kernel method. In 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 836-841. IEEE, 2014. Google Scholar