On Low-Risk Heavy Hitters and Sparse Recovery Schemes

Li, Yi; Nakos, Vasileios; Woodruff, David P.

doi:10.4230/LIPIcs.APPROX-RANDOM.2018.19

Abstract

We study the heavy hitters and related sparse recovery problems in the low failure probability regime. This regime is not well-understood, and the main previous work on this is by Gilbert et al. (ICALP'13). We recognize an error in their analysis, improve their results, and contribute new sparse recovery algorithms, as well as provide upper and lower bounds for the heavy hitters problem with low failure probability. Our results are summarized as follows:
1) (Heavy Hitters) We study three natural variants for finding heavy hitters in the strict turnstile model, where the variant depends on the quality of the desired output. For the weakest variant, we give a randomized algorithm improving the failure probability analysis of the ubiquitous Count-Min data structure. We also give a new lower bound for deterministic schemes, resolving a question about this variant posed in Question 4 in the IITK Workshop on Algorithms for Data Streams (2006). Under the strongest and well-studied l_{infty}/ l_2 variant, we show that the classical Count-Sketch data structure is optimal for very low failure probabilities, which was previously unknown.
2) (Sparse Recovery Algorithms) For non-adaptive sparse-recovery, we give sublinear-time algorithms with low-failure probability, which improve upon Gilbert et al. (ICALP'13). In the adaptive case, we improve the failure probability from a constant by Indyk et al. (FOCS '11) to e^{-k^{0.99}}, where k is the sparsity parameter.
3) (Optimal Average-Case Sparse Recovery Bounds) We give matching upper and lower bounds in all parameters, including the failure probability, for the measurement complexity of the l_2/l_2 sparse recovery problem in the spiked-covariance model, completely settling its complexity in this model.

Cite As Get BibTex

Yi Li, Vasileios Nakos, and David P. Woodruff. On Low-Risk Heavy Hitters and Sparse Recovery Schemes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 19:1-19:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.19

Author Details

Yi Li

Nanyang Technological University, Singapore

Vasileios Nakos

Harvard University, USA

David P. Woodruff

Carnegie Mellon University, USA

Funding

Nakos, Vasileios: Supported in part by NSF grant IIS-1447471
Woodruff, David P.: Supported in part by NSF grant CCF-1815840

References

Zeyuan Allen Zhu, Rati Gelashvili, and Ilya P. Razenshteyn. Restricted isometry property for general p-norms. IEEE Trans. Information Theory, 62(10):5839-5854, 2016.
Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, and Lin F. Yang. Clustering high dimensional dynamic data streams. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 576-585, 2017.
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3-15, 2004.
Graham Cormode and Marios Hadjieleftheriou. Finding frequent items in data streams. Proceedings of the VLDB Endowment, 1(2):1530-1541, 2008.
Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, 2005.
Gereon Frahling and Christian Sohler. Coresets in dynamic geometric data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22-24, 2005, pages 209-217, 2005.
S. Ganguly and A. Majumder. CR-precis: A deterministic summary structure for update data streams. eprint arXiv:cs/0609032, 2006.
Sumit Ganguly. Data stream algorithms via expander graphs. In International Symposium on Algorithms and Computation, pages 52-63. Springer, 2008.
Anna C Gilbert, Yi Li, Ely Porat, and Martin J Strauss. Approximate sparse recovery: optimizing time and measurements. SIAM Journal on Computing, 41(2):436-453, 2012.
Anna C Gilbert, Hung Q Ngo, Ely Porat, Atri Rudra, and Martin J Strauss. 𝓁₂/𝓁₂-foreach sparse recovery with low risk. In International Colloquium on Automata, Languages, and Programming, pages 461-472. Springer, 2013.
Robert D. Gordon. Values of mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statist., 12(3):364-366, 09 1941.
Rishi Gupta, Piotr Indyk, Eric Price, and Yaron Rachlin. Compressive sensing with local geometric features. Int. J. Comput. Geometry Appl., 22(4):365, 2012.
Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. Journal of the ACM (JACM), 56(4):20, 2009.
Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 373-380, 2004.
Piotr Indyk, Eric Price, and David P Woodruff. On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 285-294. IEEE, 2011.
Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 49-58. ACM, 2011.
Kasper Green Larsen, Jelani Nelson, Huy L Nguyen, and Mikkel Thorup. Heavy hitters via cluster-preserving clustering. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 61-70. IEEE, 2016.
Andrew McGregor. Open problems in data streams and related topics: Iitk workshop on algorithms for data streams, 2006, 2007.
Vasileios Nakos, Xiaofei Shi, David P. Woodruff, and Hongyang Zhang. Improved algorithms for adaptive compressed sensing. In ICALP, 2018.
Jelani Nelson, Huy L. Nguyên, and David P. Woodruff. On deterministic sketching and streaming for sparse recovery and norm estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings, pages 627-638, 2012.
E. Price and D. P. Woodruff. (1 + ε)-approximate sparse recovery. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 295-304, Oct 2011.

On Low-Risk Heavy Hitters and Sparse Recovery Schemes

Authors Yi Li, Vasileios Nakos, David P. Woodruff

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

On Low-Risk Heavy Hitters and Sparse Recovery Schemes

Authors Yi Li, Vasileios Nakos, David P. Woodruff

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message