High Probability Frequency Moment Sketches

Authors Sumit Ganguly, David P. Woodruff



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.58.pdf
  • Filesize: 0.69 MB
  • 15 pages

Document Identifiers

Author Details

Sumit Ganguly
  • Indian Institute of Technology, Kanpur, India
David P. Woodruff
  • Carnegie Mellon University, School of Computing, Pittsburg, USA

Cite As Get BibTex

Sumit Ganguly and David P. Woodruff. High Probability Frequency Moment Sketches. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 58:1-58:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ICALP.2018.58

Abstract

We consider the problem of sketching the p-th frequency moment of a vector, p>2, with multiplicative error at most 1 +/- epsilon and with high confidence 1-delta. Despite the long sequence of work on this problem, tight bounds on this quantity are only known for constant delta. While one can obtain an upper bound with error probability delta by repeating a sketching algorithm with constant error probability O(log(1/delta)) times in parallel, and taking the median of the outputs, we show this is a suboptimal algorithm! Namely, we show optimal upper and lower bounds of Theta(n^{1-2/p} log(1/delta) + n^{1-2/p} log^{2/p} (1/delta) log n) on the sketching dimension, for any constant approximation. Our result should be contrasted with results for estimating frequency moments for 1 <= p <= 2, for which we show the optimal algorithm for general delta is obtained by repeating the optimal algorithm for constant error probability O(log(1/delta)) times and taking the median output. We also obtain a matching lower bound for this problem, up to constant factors.

Subject Classification

ACM Subject Classification
  • Theory of computation → Lower bounds and information complexity
Keywords
  • Data Streams
  • Frequency Moments
  • High Confidence

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. JCSS, 58(1):137-147, 1999. Google Scholar
  2. Alexandr Andoni. High frequency moment via max stability. Available at http://web.mit.edu/andoni/www/papers/fkStable.pdf. Google Scholar
  3. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms from precision sampling. CoRR, abs/1011.1263, 2010. URL: http://arxiv.org/abs/1011.1263.
  4. Alexandr Andoni, Huy L. Nguyen, Yury Polyanskiy, and Yihong Wu. "Tight Lower Bound for Linear Sketches of Moments". In Proceedings of International Conference on Automata, Languages and Programming, (ICALP), jul 2013. Version published as arXiv:1306.6295, June 2013. Google Scholar
  5. Sepehr Assadi, Sanjeev Khanna, Yang Li, and Grigory Yaroslavtsev. Maximum matchings in dynamic graph streams and the simultaneous communication model. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1345-1364, 2016. Google Scholar
  6. Z. Bar-Yossef, T.S. Jayram, R. Kumar, and D. Sivakumar. "An information statistics approach to data stream and communication complexity". In Proceedings of ACM Symposium on Theory of Computing STOC, pages 209-218, 2002. Google Scholar
  7. Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha. Simpler algorithm for estimating frequency moments of data streams. In SODA, pages 708-713, 2006. URL: http://dx.doi.org/10.1145/1109557.1109634.
  8. Vladimir Braverman and Rafail Ostrovsky. Recursive sketching for frequency moments. CoRR, abs/1011.2571, 2010. Google Scholar
  9. Emmanuel Candès, Justin Romberg, and Terence Tao. "Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information". IEEE Trans. Inf. Theory, 52(2):489-509, feb 2006. Google Scholar
  10. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In CCC, pages 107-117, 2003. Google Scholar
  11. Moses Charikar, Kevin Chen, and Martin Farach-Colton. "Finding frequent items in data streams". Theoretical Computer Science, 312(1):3-15, 2004. Preliminary version appeared in Proceedings of ICALP 2002, pages 693-703. Google Scholar
  12. Don Coppersmith and Ravi Kumar. An improved data stream algorithm for frequency moments. In SODA, 2004. Google Scholar
  13. David L. Donoho. "Compressed Sensing". IEEE Trans. Inf. Theory, 52(4):1289-1306, apr 2006. Google Scholar
  14. Sumit Ganguly. Estimating frequency moments of data streams using random linear combinations. In RANDOM, 2004. Google Scholar
  15. Sumit Ganguly. A hybrid algorithm for estimating frequency moments of data streams, 2004. Manuscript. Google Scholar
  16. Sumit Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011. Google Scholar
  17. Sumit Ganguly. "Taylor Polynomial Estimator for Estimating Frequency Moments". In Proceedings of International Conference on Automata, Languages and Programming, (ICALP), 2015. Full version in arXiv:1506.01442. Google Scholar
  18. Moritz Hardt and David P. Woodruff. How robust are linear sketches to adaptive inputs? In Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 121-130, 2013. Google Scholar
  19. P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In STOC. ACM, 2005. Google Scholar
  20. Piotr Indyk. "Stable Distributions, Pseudo Random Generators, Embeddings and Data Stream Computation". In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science, pages 189-197, 2000. Google Scholar
  21. Y. I. Ingster and L.A. Suslina. "Non-parametric goodness-of-fit testing under Gaussian models", volume 169 of Lecture Notes in Statistics. Springer-Verlag, 2003. Google Scholar
  22. Daniel Kane, Jelani Nelson, Ely Porat, and David Woodruff. "Fast Moment Estimation in Data Streams in Optimal Space". In Proceedings of 2011 ACM Symposium on Theory of Computing, version arXiv:1007.4191v1 July, 2010. Google Scholar
  23. Daniel Kane, Jelani Nelson, Ely Porat, and David Woodruff. "Fast Moment Estimation in Data Streams in Optimal Space". In Proceedings of 2011 ACM Symposium on Theory of Computing, version arXiv:1007.4191v1 July, 2011. Google Scholar
  24. Daniel M. Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit johnson-lindenstrauss families. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14th International Workshop, APPROX 2011, and 15th International Workshop, RANDOM 2011, Princeton, NJ, USA, August 17-19, 2011. Proceedings, pages 628-639, 2011. Google Scholar
  25. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In STOC, pages 745-754, 2011. Google Scholar
  26. Daniel M. Kane, Jelani Nelson, and David Woodruff. "An Optimal Algorithm for the Distinct Elements Problem". In Proceedings of ACM International Symposium on Principles of Database Systems (PODS), pages 41-52, 2010. Google Scholar
  27. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. "On the Exact Space Complexity of Sketching and Streaming Small Norms". In Proceedings of ACM Symposium on Discrete Algorithms (SODA), 2010. Google Scholar
  28. Christian Konrad. Maximum matching in turnstile streams. In Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 840-852, 2015. Google Scholar
  29. Yi Li, Huy L. Nguyen, and David P. Woodruff. On sketching matrix norms and the top singular vector. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1562-1581, 2014. Google Scholar
  30. Yi Li and David Woodruff. "A Tight Lower Bound for High Frequency Moment Estimation with Small Error". In Proceedings of International Workshop on Randomization and Computation (RANDOM), 2013. Google Scholar
  31. Morteza Monemizadeh and David P. Woodruff. 1-pass relative-error l_p-sampling with applications. In SODA, 2010. Google Scholar
  32. Eric Price and David P. Woodruff. Applications of the shannon-hartley theorem to data streams and sparse recovery. In ISIT, 2012. Google Scholar
  33. Alexandre B. Tsybakov. "Introduction to Nonparametric Estimation". Springer, 1 edition, 2008. Google Scholar
  34. Omri Weinstein and David P. Woodruff. The simultaneous communication of disjointness with applications to data streams. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part I, pages 1082-1093, 2015. Google Scholar
  35. David P. Woodruff. "Sketching as a Tool for Numerical Linear Algebra". Foundations and Trends in Theoretical Computer Science 10:1-2, Now Publications, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail