On Low-Risk Heavy Hitters and Sparse Recovery Schemes

Authors Yi Li, Vasileios Nakos, David P. Woodruff



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2018.19.pdf
  • Filesize: 0.55 MB
  • 13 pages

Document Identifiers

Author Details

Yi Li
  • Nanyang Technological University, Singapore
Vasileios Nakos
  • Harvard University, USA
David P. Woodruff
  • Carnegie Mellon University, USA

Cite As Get BibTex

Yi Li, Vasileios Nakos, and David P. Woodruff. On Low-Risk Heavy Hitters and Sparse Recovery Schemes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 19:1-19:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.19

Abstract

We study the heavy hitters and related sparse recovery problems in the low failure probability regime. This regime is not well-understood, and the main previous work on this is by Gilbert et al. (ICALP'13). We recognize an error in their analysis, improve their results, and contribute new sparse recovery algorithms, as well as provide upper and lower bounds for the heavy hitters problem with low failure probability. Our results are summarized as follows: 
1) (Heavy Hitters) We study three natural variants for finding heavy hitters in the strict turnstile model, where the variant depends on the quality of the desired output. For the weakest variant, we give a randomized algorithm improving the failure probability analysis of the ubiquitous Count-Min data structure. We also give a new lower bound for deterministic schemes, resolving a question about this variant posed in Question 4 in the IITK Workshop on Algorithms for Data Streams (2006). Under the strongest and well-studied l_{infty}/ l_2 variant, we show that the classical Count-Sketch data structure is optimal for very low failure probabilities, which was previously unknown. 
2) (Sparse Recovery Algorithms) For non-adaptive sparse-recovery, we give sublinear-time algorithms with low-failure probability, which improve upon Gilbert et al. (ICALP'13). In the adaptive case, we improve the failure probability from a constant by Indyk et al. (FOCS '11) to e^{-k^{0.99}}, where k is the sparsity parameter. 
3) (Optimal Average-Case Sparse Recovery Bounds) We give matching upper and lower bounds in all parameters, including the failure probability, for the measurement complexity of the l_2/l_2 sparse recovery problem in the spiked-covariance model, completely settling its complexity in this model.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • heavy hitters
  • sparse recovery
  • turnstile model
  • spike covariance model
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Zeyuan Allen Zhu, Rati Gelashvili, and Ilya P. Razenshteyn. Restricted isometry property for general p-norms. IEEE Trans. Information Theory, 62(10):5839-5854, 2016. Google Scholar
  2. Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, and Lin F. Yang. Clustering high dimensional dynamic data streams. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 576-585, 2017. Google Scholar
  3. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3-15, 2004. Google Scholar
  4. Graham Cormode and Marios Hadjieleftheriou. Finding frequent items in data streams. Proceedings of the VLDB Endowment, 1(2):1530-1541, 2008. Google Scholar
  5. Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, 2005. Google Scholar
  6. Gereon Frahling and Christian Sohler. Coresets in dynamic geometric data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22-24, 2005, pages 209-217, 2005. Google Scholar
  7. S. Ganguly and A. Majumder. CR-precis: A deterministic summary structure for update data streams. eprint arXiv:cs/0609032, 2006. Google Scholar
  8. Sumit Ganguly. Data stream algorithms via expander graphs. In International Symposium on Algorithms and Computation, pages 52-63. Springer, 2008. Google Scholar
  9. Anna C Gilbert, Yi Li, Ely Porat, and Martin J Strauss. Approximate sparse recovery: optimizing time and measurements. SIAM Journal on Computing, 41(2):436-453, 2012. Google Scholar
  10. Anna C Gilbert, Hung Q Ngo, Ely Porat, Atri Rudra, and Martin J Strauss. 𝓁₂/𝓁₂-foreach sparse recovery with low risk. In International Colloquium on Automata, Languages, and Programming, pages 461-472. Springer, 2013. Google Scholar
  11. Robert D. Gordon. Values of mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann. Math. Statist., 12(3):364-366, 09 1941. Google Scholar
  12. Rishi Gupta, Piotr Indyk, Eric Price, and Yaron Rachlin. Compressive sensing with local geometric features. Int. J. Comput. Geometry Appl., 22(4):365, 2012. Google Scholar
  13. Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. Journal of the ACM (JACM), 56(4):20, 2009. Google Scholar
  14. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 373-380, 2004. Google Scholar
  15. Piotr Indyk, Eric Price, and David P Woodruff. On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 285-294. IEEE, 2011. Google Scholar
  16. Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 49-58. ACM, 2011. Google Scholar
  17. Kasper Green Larsen, Jelani Nelson, Huy L Nguyen, and Mikkel Thorup. Heavy hitters via cluster-preserving clustering. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 61-70. IEEE, 2016. Google Scholar
  18. Andrew McGregor. Open problems in data streams and related topics: Iitk workshop on algorithms for data streams, 2006, 2007. Google Scholar
  19. Vasileios Nakos, Xiaofei Shi, David P. Woodruff, and Hongyang Zhang. Improved algorithms for adaptive compressed sensing. In ICALP, 2018. Google Scholar
  20. Jelani Nelson, Huy L. Nguyên, and David P. Woodruff. On deterministic sketching and streaming for sparse recovery and norm estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings, pages 627-638, 2012. Google Scholar
  21. E. Price and D. P. Woodruff. (1 + ε)-approximate sparse recovery. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 295-304, Oct 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail