Document Open Access Logo

Separations for Estimating Large Frequency Moments on Data Streams

Authors David P. Woodruff, Samson Zhou



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.112.pdf
  • Filesize: 0.84 MB
  • 21 pages

Document Identifiers

Author Details

David P. Woodruff
  • Carnegie Mellon University, Pittsburgh, PA, USA
Samson Zhou
  • Carnegie Mellon University, Pittsburgh, PA, USA

Cite AsGet BibTex

David P. Woodruff and Samson Zhou. Separations for Estimating Large Frequency Moments on Data Streams. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 112:1-112:21, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.112

Abstract

We study the classical problem of moment estimation of an underlying vector whose n coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order insertion-only model, then there exist space efficient algorithms with improved dependencies on the approximation parameter ε. In particular, for any real p > 2, we first obtain an algorithm for F_p moment estimation using 𝒪̃(1/(ε^{4/p})⋅ n^{1-2/p}) bits of memory. Our techniques also give algorithms for F_p moment estimation with p > 2 on arbitrary order insertion-only and turnstile streams, using 𝒪̃(1/(ε^{4/p})⋅ n^{1-2/p}) bits of space and two passes, which is the first optimal multi-pass F_p estimation algorithm up to log n factors. Finally, we give an improved lower bound of Ω(1/(ε²)⋅ n^{1-2/p}) for one-pass insertion-only streams. Our results separate the complexity of this problem both between random and non-random orders, as well as one-pass and multi-pass streams.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • streaming algorithms
  • frequency moments
  • random order
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. Google Scholar
  2. Alexandr Andoni. High frequency moments via max-stability. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pages 6364-6368, 2017. Google Scholar
  3. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS, pages 363-372, 2011. Google Scholar
  4. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702-732, 2004. Google Scholar
  5. Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha. Simpler algorithm for estimating frequency moments of data streams. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 708-713, 2006. Google Scholar
  6. Jaroslaw Blasiok, Jian Ding, and Jelani Nelson. Continuous monitoring of 𝓁_p norms in data streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, volume 81, pages 32:1-32:13, 2017. Google Scholar
  7. Mark Braverman, Sumegha Garg, and David P. Woodruff. The coin problem with applications to data streams. Electron. Colloquium Comput. Complex., 27:139, 2020. Google Scholar
  8. Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff. Bptree: An 𝓁₂ heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pages 361-376, 2017. Google Scholar
  9. Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An optimal algorithm for large frequency moments using O(n^1-2/k) bits. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 531-544, 2014. Google Scholar
  10. Vladimir Braverman and Rafail Ostrovsky. Approximating large frequency moments with pick-and-drop sampling. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 16th International Workshop, APPROX, and 17th International Workshop, RANDOM. Proceedings, 2013. Google Scholar
  11. Vladimir Braverman, Emanuele Viola, David P. Woodruff, and Lin F. Yang. Revisiting frequency moment estimation in random order streams. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 25:1-25:14, 2018. Google Scholar
  12. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. Theory Comput., 12(1):1-35, 2016. Google Scholar
  13. Amit Chakrabarti, T. S. Jayram, and Mihai Patrascu. Tight lower bounds for selection in randomly ordered streams. In Shang-Hua Teng, editor, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 720-729, 2008. Google Scholar
  14. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In 18th Annual IEEE Conference on Computational Complexity, pages 107-117, 2003. Google Scholar
  15. Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3-15, 2004. Google Scholar
  16. Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Frequency estimation of internet packet streams with limited space. In Algorithms - ESA, 10th Annual European Symposium, Proceedings, pages 348-360, 2002. Google Scholar
  17. Sumit Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011. URL: http://arxiv.org/abs/1104.4552.
  18. Sumit Ganguly. A lower bound for estimating high moments of a data stream. CoRR, abs/1201.0253, 2012. URL: http://arxiv.org/abs/1201.0253.
  19. Sumit Ganguly and David P. Woodruff. High probability frequency moment sketches. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, 2018. Google Scholar
  20. Sudipto Guha and Zhiyi Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. In International Colloquium on Automata, Languages, and Programming, pages 513-524. Springer, 2009. Google Scholar
  21. Sudipto Guha and Andrew McGregor. Approximate quantiles and the order of the stream. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 273-279, 2006. Google Scholar
  22. Sudipto Guha and Andrew McGregor. Lower bounds for quantile estimation in random-order and multi-pass streaming. In Automata, Languages and Programming, 34th International Colloquium, ICALP, Proceedings, pages 704-715, 2007. Google Scholar
  23. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. Google Scholar
  24. Piotr Indyk and David P. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pages 202-208, 2005. Google Scholar
  25. Rajesh Jayaram and David P. Woodruff. Towards optimal moment estimation in streaming and distributed models. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 29:1-29:21, 2019. Google Scholar
  26. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC, pages 745-754, 2011. Google Scholar
  27. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1161-1178, 2010. Google Scholar
  28. Christian Konrad, Frédéric Magniez, and Claire Mathieu. Maximum matching in semi-streaming with few passes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX, and 16th International Workshop, RANDOM, pages 231-242, 2012. Google Scholar
  29. Ping Li. Estimators and tail bounds for dimension reduction in 𝓁_α (0 < α ≤ 2) using stable random projections. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 10-19, 2008. Google Scholar
  30. Yi Li and David P. Woodruff. A tight lower bound for high frequency moment estimation with small error. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 16th International Workshop, APPROX, and 17th International Workshop, RANDOM. Proceedings, pages 623-638, 2013. Google Scholar
  31. Morteza Monemizadeh and David P. Woodruff. 1-pass relative-error L_p-sampling with applications. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1143-1160, 2010. Google Scholar
  32. J. Ian Munro and Mike Paterson. Selection and sorting with limited storage. Theor. Comput. Sci., 12:315-323, 1980. Google Scholar
  33. David P. Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 167-175, 2004. Google Scholar
  34. David P. Woodruff and Qin Zhang. Tight bounds for distributed functional monitoring. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC, pages 941-960, 2012. Google Scholar
  35. David P. Woodruff and Samson Zhou. Tight bounds for adversarially robust streams and sliding windows via difference estimators. CoRR, abs/2011.07471, 2020. URL: http://arxiv.org/abs/2011.07471.
  36. David P. Woodruff and Samson Zhou. Separations for estimating large frequency moments on data streams. CoRR, abs/2105.03773, 2021. URL: http://arxiv.org/abs/2105.03773.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail