The GaussianSketch for Almost Relative Error Kernel Distance

Authors Jeff M. Phillips, Wai Ming Tai



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2020.12.pdf
  • Filesize: 0.65 MB
  • 20 pages

Document Identifiers

Author Details

Jeff M. Phillips
  • School of Computing, University of Utah, Salt Lake City, UT, USA
Wai Ming Tai
  • School of Computing, University of Utah, Salt Lake City, UT, USA

Acknowledgements

We thank Rasmus Pagh for early conversations on this topic which helped reignite and motivate this line of thought. Part of the work was completed while the first author was visiting the Simons Institute for Theory of Computing.

Cite AsGet BibTex

Jeff M. Phillips and Wai Ming Tai. The GaussianSketch for Almost Relative Error Kernel Distance. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.12

Abstract

We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space. These work by truncating infinite expansions of the Gaussian kernel, and carefully invoking the RecursiveTensorSketch [Ahle et al. SODA 2020]. After providing concentration and approximation properties of these sketches, we use them to approximate the kernel distance between points sets. These sketches yield almost (1+ε)-relative error, but with a small additive α term. In the first variants the dependence on 1/α is poly-logarithmic, but has higher degree of polynomial dependence on the original dimension d. In the second variant, the dependence on 1/α is still poly-logarithmic, but the dependence on d is linear.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Kernel Distance
  • Kernel Density Estimation
  • Sketching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dmitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Comp. & Sys. Sci., 66:671-687, 2003. Google Scholar
  2. Thomas D Ahle, Michael Kapralov, Jakob BT Knudsen, Rasmus Pagh, Ameya Velingker, David P Woodruff, and Amir Zandieh. Oblivious sketching of high-degree polynomial kernels. In SODA, 2020. Google Scholar
  3. Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual bch codes. Discrete & Computational Geometry, 42(615), 2009. Google Scholar
  4. Nir Ailon and Edo Liberty. An almost optimal unrestricted fast johnson-lindenstrauss transform. In SODA, 2011. Google Scholar
  5. Helmut Alt and Leonidas J. Guibas. Discrete geometric shapes: Matching, interpolation, and approximation: A survey, 1996. In Handbook of Computational Geometry. Google Scholar
  6. Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, 2006. Google Scholar
  7. Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. Earth mover distance over high-dimensional spaces. In SODA, 2008. Google Scholar
  8. N. Aronszajn. Theory of reproducing kernels. Trans. AMS, 68:337-404, 1950. URL: http://www.jstor.org/stable/1990404.
  9. Haim Avron, Michael Kapralov, Cameron Musco, Chistopher Musco, Ameya Velingker, and Amir Zandier. Random fourier features for kernel ridge regression: Approximation bounds and statistical guarantees. In ICML, 2017. Google Scholar
  10. Haim Avron, Huy L. Nguyen, and David P. Woodruff. Subspace embeddings for the polynomial kernel. In NIPS, 2014. Google Scholar
  11. Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, and Ronnit Rubinfeld. Sublinear time algorithms for earth mover’s distance. Theory Comput Syst, 48:428-442, 2011. Google Scholar
  12. Moses Charikar and Paris Siminelakis. Hashing-based-estimators for kernel density in high dimensions. In FOCS, 2017. Google Scholar
  13. Edgar Chavez, Ana C. Chávez Cáliz, and Jorge L. López-López. Affine invariants of generalized polygons and matching under affine transformations. Computational Geometry: Theory and Applications, 58:60-69, 2017. Google Scholar
  14. Di Chen and Jeff M. Phillips. Relative error embeddings for the gaussian kernel distance. In Algorithmic Learning Theory, 2017. Google Scholar
  15. Kenneth L Clarkson and David P Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 205-214. ACM, 2009. Google Scholar
  16. Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Mădălina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In STOC, 2015. Google Scholar
  17. Andrew Cotter, Joseph Keshet, and Nathan Srebro. Explicit approximations of the gaussian kernel. arXiv preprint, 2011. URL: http://arxiv.org/abs/1109.4603.
  18. Anne Driemel and Francesco Silvestri. Locality-sensitive hashing of curves. In 33rd International Symposium on Computational Geometry, 2017. Google Scholar
  19. Petros Drineas and Michael W Mahoney. On the nyström method for approximating a gram matrix for improved kernel-based learning. The Journal of Machine Learning Research, 6:2153-2175, 2005. Google Scholar
  20. Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, Christian Doppler Laboratory for Expert Systems, 1994. Google Scholar
  21. Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. A search engine for 3D models. ACM Transactions on Graphics, 22:83-105, 2003. Google Scholar
  22. Mina Ghashami, Daniel Perry, and Jeff M. Phillips. Streaming kernel principal component analysis. In AIStats, 2016. Google Scholar
  23. Joan Glaunès and Sarang Joshi. Template estimation form unlabeled point set data and surfaces for computational anatomy. In Math. Found. Comp. Anatomy, 2006. Google Scholar
  24. Arthur Gretton, Marsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alex J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012. Google Scholar
  25. William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189-206, 1984. Google Scholar
  26. Sarang Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kurrent distance. In Proceedings 27th Annual Symposium on Computational Geometry, 2011. URL: http://arxiv.org/abs/1001.0591.
  27. Ravi Kannan, Santosh Vempala, and David Woodruff. Principal component analysis and higher correlations for distributed data. In Conference on Learning Theory, pages 1040-1057, 2014. Google Scholar
  28. Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In FOCS, 2017. Google Scholar
  29. Quoc Le, Tamás Sarlós, and Alex Smola. Fastfood - approximating kernel expansions in loglinear time. In ICML, 2013. Google Scholar
  30. David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, and Bernhard Schölkopf. Randomized nonlinear component analysis. ICML, 2014. Google Scholar
  31. J. Mercer. Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 209:441-458, 1909. Google Scholar
  32. Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. arXiv preprint, 2016. URL: http://arxiv.org/abs/1605.09522.
  33. Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10:1-141, 2017. Google Scholar
  34. Cameron Musco and David Woodruff. Is input sparsity time possible for kernel low-rank approximation? In NeurIPS, 2017. Google Scholar
  35. Cameron Musco and David P. Woodruff. Sublinear time low-rank approximation of positive semidefinite matrices. In FOCS, 2017. Google Scholar
  36. Jeff M. Phillips and Suresh Venkatasubramanian. A gentle introduction to the kernel distance. Technical report, arXiv, 2011. URL: http://arxiv.org/abs/1103.1625.
  37. Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177-1184, 2007. Google Scholar
  38. Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, pages 2263-2291, 2013. Google Scholar
  39. Alex J. Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A Hilbert space embedding for distributions. In ICALT, 2007. Google Scholar
  40. Bharath Sriperumbudur et al. On the optimal estimation of probability measures in weak and strong topologies. Bernoulli, 22(3):1839-1893, 2016. Google Scholar
  41. Bharath Sriperumbudur and Nicholas Sterge. Approximate kernel pca using random features: Computational vs. statistical trade-off. Technical report, arXiv, 2018. URL: http://arxiv.org/abs/1706.06296.
  42. Bharath K. Sriperumbudur, Kenji Fukumizu, and Gert R. G. Lanckriet. Universality, characteristic kernels and rkhs embedding of measures. JMLR, pages 2389-2410, 2011. Google Scholar
  43. Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517-1561, 2010. Google Scholar
  44. Enayat Ullah, Poorya Mianjy, Teodor V. Marinov, and Raman Arora. Streaming kernel pca with ̃ o(√n) random features. In NeruIPS, 2018. Google Scholar
  45. Shusen Wang, Alex Gittens, and Michael W. Mahoney. Scalable kernel k-means clustering with nystrom approximation: Relative-error bounds. JMLR, arXiv, (to appear). URL: http://arxiv.org/abs/1706.02803.
  46. David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10:1-157, 2014. Google Scholar
  47. Wojciech Zaremba, Arthur Gretton, and Matthew Blaschko. B-tests: Low variance kernel two-sample tests. In NIPS, 2013. Google Scholar
  48. Ji Zhao and Deyu Meng. Fastmmd: Ensemble of circular discrepancy for efficient two-sample test. Neural Computation, 27:1354-1372, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail