Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

Authors Matti Karppa, Petteri Kaski, Jukka Kohonen, Padraig Ó Catháin



PDF
Thumbnail PDF

File

LIPIcs.ESA.2016.52.pdf
  • Filesize: 0.57 MB
  • 17 pages

Document Identifiers

Author Details

Matti Karppa
Petteri Kaski
Jukka Kohonen
Padraig Ó Catháin

Cite AsGet BibTex

Matti Karppa, Petteri Kaski, Jukka Kohonen, and Padraig Ó Catháin. Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 52:1-52:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ESA.2016.52

Abstract

We derandomize G. Valiant's [J.ACM 62(2015) Art.13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant's randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math 155(2002), 157-187]. We say that a function f:{-1,1}^d ->{-1,1}^D is a correlation amplifier with threshold 0 <= tau <= 1, error gamma >= 1, and strength p an even positive integer if for all pairs of vectors x,y in {-1,1}^d it holds that (i) |<x,y>|<tau d implies |<f(x),f(y)>| <= (tau*gamma)^p*D; and (ii) |<x,y>| >= tau*d implies (<x,y>/gamma^d})^p*D <= <f(x),f(y)> <= (gamma*<x,y>/d)^p*D.
Keywords
  • correlation
  • derandomization
  • outlier
  • similarity search
  • expander graph

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Thomas D. Ahle, Rasmus Pagh, Ilya Razenshteyn, and Francesco Silvestri. On the complexity of inner product similarity join. arXiv, abs/1510.02824, 2015. Google Scholar
  2. Josh Alman and Ryan Williams. Probabilistic polynomials and Hamming nearest neighbors. In Proc. 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 136-150, Los Alamitos, CA, USA, 2015. IEEE Computer Society. Google Scholar
  3. Noga Alon. Problems and results in extremal combinatorics - I. Discrete Math., 273(1-3):31-53, 2003. Google Scholar
  4. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1018-1028, Philadelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics. URL: http://dx.doi.org/10.1137/1.9781611973402.76.
  5. Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. 47th ACM Annual Symposium on the Theory of Computing (STOC), pages 793-801, New York, NY, USA, 2015. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/2746539.2746553.
  6. L. Elisa Celis, Omer Reingold, Gil Segev, and Udi Wieder. Balls and bins: Smaller hash families and faster evaluation. SIAM J. Comput., 42(3):1030-1050, 2013. Google Scholar
  7. Timothy M. Chan and Ryan Williams. Deterministic APSP, orthogonal vectors, and more: Quickly derandomizing Razborov-Smolensky. In Robert Krauthgamer, editor, Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1246-1255, Arlington, VA, USA, 2016. Society for Industrial and Applied Mathematics. Google Scholar
  8. Moshe Dubiner. Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Inf. Theory, 56(8):4166-4179, 2010. URL: http://dx.doi.org/10.1109/TIT.2010.2050814.
  9. Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput., 39(2):606-645, 2009. URL: http://dx.doi.org/10.1137/070684914.
  10. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, Proc. 25th International Conference on Very Large Data Bases (VLDB'99), pages 518-529, Edinburgh, Scotland, UK, 1999. Morgan Kaufmann. Google Scholar
  11. Parikshit Gopalan, Daniek Kane, and Raghu Meka. Pseudorandomness via the Discrete Fourier Transform. In Proc. IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 903-922, Berkeley, CA, USA, 2015. IEEE Computer Society. Google Scholar
  12. Parikshit Gopalan, Raghu Meka, Omer Reingold, and David Zuckerman. Pseudorandom generators for combinatorial shapes. SIAM J. Comput., 42(3):1051-1076, 2013. Google Scholar
  13. Elena Grigorescu, Lev Reyzin, and Santosh Vempala. On noise-tolerant learning of sparse parities and related problems. In Proc. 22nd International Conference on Algorithmic Learning Theory (ALT), pages 413-424, Berlin, Germany, 2011. Springer. URL: http://dx.doi.org/10.1007/978-3-642-24412-4_32.
  14. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13-30, 1963. Google Scholar
  15. Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc., 43(4):439-561, 2006. Google Scholar
  16. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  17. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symposium on the Theory of Computing (STOC), pages 604-613, New York, NY, USA, 1998. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/276698.276876.
  18. Daniel M. Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Proc. 14th International Workshop on Approximation, Randomization, and Combinatorial Optimization, RANDOM and 15th International Workshop on Algorithms and Techniques, APPROX, pages 628-639, Princeton, NJ, USA, 2011. Google Scholar
  19. Michael Kapralov. Smooth tradeoffs between insert and query complexity in nearest neighbor search. In Proc. 34th ACM Symposium on Principles of Database Systems (PODS), pages 329-342, New York, NY, USA, 2015. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/2745754.2745761.
  20. Matti Karppa, Petteri Kaski, and Jukka Kohonen. A faster subquadratic algorithm for finding outlier correlations. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1288-1305, Arlington, VA, USA, 2016. Society for Industrial and Applied Mathematics. Google Scholar
  21. Pravesh K. Kothari and Raghu Meka. Almost optimal pseudorandom generators for spherical caps. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC), pages 247-256, Portland, OR, USA, 2015. Google Scholar
  22. François Le Gall. Faster algorithms for rectangular matrix multiplication. In Proc. 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 514-523, Los Alamitos, CA, USA, 2012. IEEE Computer Society. URL: http://dx.doi.org/10.1109/FOCS.2012.80.
  23. A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, 8:261-277, 1988. Google Scholar
  24. Alexander May and Ilya Ozerov. On computing nearest neighbors with applications to decoding of binary linear codes. In Proc. EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 203-228, Berlin, Germany, 2015. Springer. URL: http://dx.doi.org/10.1007/978-3-662-46800-5_9.
  25. Elchanan Mossel, Ryan O'Donnell, and Rocco A. Servedio. Learning functions of k relevant variables. J. Comput. Syst. Sci., 69(3):421-434, 2004. URL: http://dx.doi.org/10.1016/j.jcss.2004.04.002.
  26. Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. SIAM J. Discrete Math., 21(4):930-935, 2007. URL: http://dx.doi.org/10.1137/050646858.
  27. Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theory, 6(1):Article 5, 2014. URL: http://dx.doi.org/10.1145/2578221.
  28. Rasmus Pagh. Locality-sensitive hashing without false negatives. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1-9, Philadelphia, PA, USA, 2016. Society for Industrial and Applied Mathematics. Google Scholar
  29. Ramamohan Paturi, Sanguthevar Rajasekaran, and John H. Reif. The light bulb problem. In Proc. 2nd Annual Workshop on Computational Learning Theory (COLT), pages 261-268, New York, NY, USA, 1989. Association for Computing Machinery. Google Scholar
  30. Ninh Pham and Rasmus Pagh. Scalability and total recall with fast CoveringLSH. arXiv, abs/1602.02620, 2016. Google Scholar
  31. Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders. Ann. of Math., 155(1):157-187, 2002. Google Scholar
  32. Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Math. Comp., 54:435-447, 1990. Google Scholar
  33. Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and the closest pair problem. J. ACM, 62(2):Article 13, 2015. URL: http://dx.doi.org/10.1145/2728167.
  34. Leslie G. Valiant. Functionality in neural nets. In Proc. 1st Annual Workshop on Computational Learning Theory (COLT), pages 28-39, New York, NY, USA, 1988. Association for Computing Machinery. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail