The GaussianSketch for Almost Relative Error Kernel Distance

Phillips, Jeff M.; Tai, Wai Ming

doi:10.4230/LIPIcs.APPROX/RANDOM.2020.12

Abstract

We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+ε)-relative error, but with a small additive α term. In the first variants the dependence on 1/α is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/α is still poly-logarithmic, but the dependence on d is linear.

Dmitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Comp. & Sys. Sci., 66:671-687, 2003.
Thomas D Ahle, Michael Kapralov, Jakob BT Knudsen, Rasmus Pagh, Ameya Velingker, David P Woodruff, and Amir Zandieh. Oblivious sketching of high-degree polynomial kernels. In SODA, 2020.
Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes. Discrete & Computational Geometry, 42(615), 2009.
Nir Ailon and Edo Liberty. An almost optimal unrestricted fast johnson-lindenstrauss transform. In SODA, 2011.
Helmut Alt and Leonidas J. Guibas. Discrete geometric shapes: Matching, interpolation, and approximation: A survey, 1996. In Handbook of Computational Geometry.
Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, 2006.
Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. Earth mover distance over high-dimensional spaces. In SODA, 2008.
N. Aronszajn. Theory of reproducing kernels. Trans. AMS, 68:337-404, 1950. URL: http://www.jstor.org/stable/1990404.
Haim Avron, Michael Kapralov, Cameron Musco, Chistopher Musco, Ameya Velingker, and Amir Zandier. Random fourier features for kernel ridge regression: Approximation bounds and statistical guarantees. In ICML, 2017.
Haim Avron, Huy L. Nguyen, and David P. Woodruff. Subspace embeddings for the polynomial kernel. In NIPS, 2014.
Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, and Ronnit Rubinfeld. Sublinear time algorithms for earth mover’s distance. Theory Comput Syst, 48:428-442, 2011.
Moses Charikar and Paris Siminelakis. Hashing-based-estimators for kernel density in high dimensions. In FOCS, 2017.
Edgar Chavez, Ana C. Chávez Cáliz, and Jorge L. López-López. Affine invariants of generalized polygons and matching under affine transformations. Computational Geometry: Theory and Applications, 58:60-69, 2017.
Di Chen and Jeff M. Phillips. Relative error embeddings for the gaussian kernel distance. In Algorithmic Learning Theory, 2017.
Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 205-214. ACM, 2009.
Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Mădălina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In STOC, 2015.
Andrew Cotter, Joseph Keshet, and Nathan Srebro. Explicit approximations of the gaussian kernel. arXiv preprint, 2011. URL: http://arxiv.org/abs/1109.4603.
Anne Driemel and Francesco Silvestri. Locality-sensitive hashing of curves. In 33rd International Symposium on Computational Geometry, 2017.
Petros Drineas and Michael W Mahoney. On the nyström method for approximating a gram matrix for improved kernel-based learning. The Journal of Machine Learning Research, 6:2153-2175, 2005.
Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, Christian Doppler Laboratory for Expert Systems, 1994.
Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. A search engine for 3D models. ACM Transactions on Graphics, 22:83-105, 2003.
Mina Ghashami, Daniel Perry, and Jeff M. Phillips. Streaming kernel principal component analysis. In AIStats, 2016.
Joan Glaunès and Sarang Joshi. Template estimation form unlabeled point set data and surfaces for computational anatomy. In Math. Found. Comp. Anatomy, 2006.
Arthur Gretton, Marsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alex J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189-206, 1984.
Sarang Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kurrent distance. In Proceedings 27th Annual Symposium on Computational Geometry, 2011. URL: http://arxiv.org/abs/1001.0591.
Ravi Kannan, Santosh Vempala, and David Woodruff. Principal component analysis and higher correlations for distributed data. In Conference on Learning Theory, pages 1040-1057, 2014.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In FOCS, 2017.
Quoc Le, Tamás Sarlós, and Alex Smola. Fastfood - approximating kernel expansions in loglinear time. In ICML, 2013.
David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. Randomized nonlinear component analysis. ICML, 2014.
J. Mercer. Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 209:441-458, 1909.
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. arXiv preprint, 2016. URL: http://arxiv.org/abs/1605.09522.
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10:1-141, 2017.
Cameron Musco and David Woodruff. Is input sparsity time possible for kernel low-rank approximation? In NeurIPS, 2017.
Cameron Musco and David P. Woodruff. Sublinear time low-rank approximation of positive semidefinite matrices. In FOCS, 2017.
Jeff M. Phillips and Suresh Venkatasubramanian. A gentle introduction to the kernel distance. Technical report, arXiv, 2011. URL: http://arxiv.org/abs/1103.1625.
Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177-1184, 2007.
Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, pages 2263-2291, 2013.
Alex J. Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A Hilbert space embedding for distributions. In ICALT, 2007.
Bharath Sriperumbudur et al. On the optimal estimation of probability measures in weak and strong topologies. Bernoulli, 22(3):1839-1893, 2016.
Bharath Sriperumbudur and Nicholas Sterge. Approximate kernel pca using random features: Computational vs. statistical trade-off. Technical report, arXiv, 2018. URL: http://arxiv.org/abs/1706.06296.
Bharath K. Sriperumbudur, Kenji Fukumizu, and Gert R. G. Lanckriet. Universality, characteristic kernels and rkhs embedding of measures. JMLR, pages 2389-2410, 2011.
Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517-1561, 2010.
Enayat Ullah, Poorya Mianjy, Teodor V. Marinov, and Raman Arora. Streaming kernel pca with ̃ o(√n) random features. In NeruIPS, 2018.
Shusen Wang, Alex Gittens, and Michael W. Mahoney. Scalable kernel k-means clustering with nystrom approximation: Relative-error bounds. JMLR, arXiv, (to appear). URL: http://arxiv.org/abs/1706.02803.
David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10:1-157, 2014.
Wojciech Zaremba, Arthur Gretton, and Matthew Blaschko. B-tests: Low variance kernel two-sample tests. In NIPS, 2013.
Ji Zhao and Deyu Meng. Fastmmd: Ensemble of circular discrepancy for efficient two-sample test. Neural Computation, 27:1354-1372, 2015.

The GaussianSketch for Almost Relative Error Kernel Distance

Authors Jeff M. Phillips, Wai Ming Tai

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

The GaussianSketch for Almost Relative Error Kernel Distance

Authors Jeff M. Phillips, Wai Ming Tai

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References