Document

# Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

## File

LIPIcs.ESA.2016.52.pdf
• Filesize: 0.57 MB
• 17 pages

## Cite As

Matti Karppa, Petteri Kaski, Jukka Kohonen, and Padraig Ó Catháin. Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 52:1-52:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)
https://doi.org/10.4230/LIPIcs.ESA.2016.52

## Abstract

We derandomize G. Valiant's [J.ACM 62(2015) Art.13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant's randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math 155(2002), 157-187]. We say that a function f:{-1,1}^d ->{-1,1}^D is a correlation amplifier with threshold 0 <= tau <= 1, error gamma >= 1, and strength p an even positive integer if for all pairs of vectors x,y in {-1,1}^d it holds that (i) |<x,y>|<tau d implies |<f(x),f(y)>| <= (tau*gamma)^p*D; and (ii) |<x,y>| >= tau*d implies (<x,y>/gamma^d})^p*D <= <f(x),f(y)> <= (gamma*<x,y>/d)^p*D.
##### Keywords
• correlation
• derandomization
• outlier
• similarity search
• expander graph

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Thomas D. Ahle, Rasmus Pagh, Ilya Razenshteyn, and Francesco Silvestri. On the complexity of inner product similarity join. arXiv, abs/1510.02824, 2015.
2. Josh Alman and Ryan Williams. Probabilistic polynomials and Hamming nearest neighbors. In Proc. 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 136-150, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
3. Noga Alon. Problems and results in extremal combinatorics - I. Discrete Math., 273(1-3):31-53, 2003.
4. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive hashing. In Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1018-1028, Philadelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics. URL: http://dx.doi.org/10.1137/1.9781611973402.76.
5. Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. 47th ACM Annual Symposium on the Theory of Computing (STOC), pages 793-801, New York, NY, USA, 2015. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/2746539.2746553.
6. L. Elisa Celis, Omer Reingold, Gil Segev, and Udi Wieder. Balls and bins: Smaller hash families and faster evaluation. SIAM J. Comput., 42(3):1030-1050, 2013.
7. Timothy M. Chan and Ryan Williams. Deterministic APSP, orthogonal vectors, and more: Quickly derandomizing Razborov-Smolensky. In Robert Krauthgamer, editor, Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1246-1255, Arlington, VA, USA, 2016. Society for Industrial and Applied Mathematics.
8. Moshe Dubiner. Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Inf. Theory, 56(8):4166-4179, 2010. URL: http://dx.doi.org/10.1109/TIT.2010.2050814.
9. Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput., 39(2):606-645, 2009. URL: http://dx.doi.org/10.1137/070684914.
10. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, Proc. 25th International Conference on Very Large Data Bases (VLDB'99), pages 518-529, Edinburgh, Scotland, UK, 1999. Morgan Kaufmann.
11. Parikshit Gopalan, Daniek Kane, and Raghu Meka. Pseudorandomness via the Discrete Fourier Transform. In Proc. IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 903-922, Berkeley, CA, USA, 2015. IEEE Computer Society.
12. Parikshit Gopalan, Raghu Meka, Omer Reingold, and David Zuckerman. Pseudorandom generators for combinatorial shapes. SIAM J. Comput., 42(3):1051-1076, 2013.
13. Elena Grigorescu, Lev Reyzin, and Santosh Vempala. On noise-tolerant learning of sparse parities and related problems. In Proc. 22nd International Conference on Algorithmic Learning Theory (ALT), pages 413-424, Berlin, Germany, 2011. Springer. URL: http://dx.doi.org/10.1007/978-3-642-24412-4_32.
14. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13-30, 1963.
15. Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc., 43(4):439-561, 2006.
16. Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-SAT. J. Comput. Syst. Sci., 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
17. Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symposium on the Theory of Computing (STOC), pages 604-613, New York, NY, USA, 1998. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/276698.276876.
18. Daniel M. Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Proc. 14th International Workshop on Approximation, Randomization, and Combinatorial Optimization, RANDOM and 15th International Workshop on Algorithms and Techniques, APPROX, pages 628-639, Princeton, NJ, USA, 2011.
19. Michael Kapralov. Smooth tradeoffs between insert and query complexity in nearest neighbor search. In Proc. 34th ACM Symposium on Principles of Database Systems (PODS), pages 329-342, New York, NY, USA, 2015. Association for Computing Machinery. URL: http://dx.doi.org/10.1145/2745754.2745761.
20. Matti Karppa, Petteri Kaski, and Jukka Kohonen. A faster subquadratic algorithm for finding outlier correlations. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA), pages 1288-1305, Arlington, VA, USA, 2016. Society for Industrial and Applied Mathematics.
21. Pravesh K. Kothari and Raghu Meka. Almost optimal pseudorandom generators for spherical caps. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC), pages 247-256, Portland, OR, USA, 2015.
22. François Le Gall. Faster algorithms for rectangular matrix multiplication. In Proc. 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 514-523, Los Alamitos, CA, USA, 2012. IEEE Computer Society. URL: http://dx.doi.org/10.1109/FOCS.2012.80.
23. A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, 8:261-277, 1988.
24. Alexander May and Ilya Ozerov. On computing nearest neighbors with applications to decoding of binary linear codes. In Proc. EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 203-228, Berlin, Germany, 2015. Springer. URL: http://dx.doi.org/10.1007/978-3-662-46800-5_9.
25. Elchanan Mossel, Ryan O'Donnell, and Rocco A. Servedio. Learning functions of k relevant variables. J. Comput. Syst. Sci., 69(3):421-434, 2004. URL: http://dx.doi.org/10.1016/j.jcss.2004.04.002.
26. Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. SIAM J. Discrete Math., 21(4):930-935, 2007. URL: http://dx.doi.org/10.1137/050646858.
27. Ryan O'Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theory, 6(1):Article 5, 2014. URL: http://dx.doi.org/10.1145/2578221.
28. Rasmus Pagh. Locality-sensitive hashing without false negatives. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1-9, Philadelphia, PA, USA, 2016. Society for Industrial and Applied Mathematics.
29. Ramamohan Paturi, Sanguthevar Rajasekaran, and John H. Reif. The light bulb problem. In Proc. 2nd Annual Workshop on Computational Learning Theory (COLT), pages 261-268, New York, NY, USA, 1989. Association for Computing Machinery.
30. Ninh Pham and Rasmus Pagh. Scalability and total recall with fast CoveringLSH. arXiv, abs/1602.02620, 2016.
31. Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders. Ann. of Math., 155(1):157-187, 2002.
32. Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Math. Comp., 54:435-447, 1990.
33. Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and the closest pair problem. J. ACM, 62(2):Article 13, 2015. URL: http://dx.doi.org/10.1145/2728167.
34. Leslie G. Valiant. Functionality in neural nets. In Proc. 1st Annual Workshop on Computational Learning Theory (COLT), pages 28-39, New York, NY, USA, 1988. Association for Computing Machinery.