Revisiting Frequency Moment Estimation in Random Order Streams

Authors Vladimir Braverman, Emanuele Viola, David P. Woodruff, Lin F. Yang



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.25.pdf
  • Filesize: 0.6 MB
  • 14 pages

Document Identifiers

Author Details

Vladimir Braverman
  • Johns Hopkins University
Emanuele Viola
  • Northeastern University
David P. Woodruff
  • Carnegie Mellon University
Lin F. Yang
  • Princeton University

Cite AsGet BibTex

Vladimir Braverman, Emanuele Viola, David P. Woodruff, and Lin F. Yang. Revisiting Frequency Moment Estimation in Random Order Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 25:1-25:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.25

Abstract

We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments F_p for 0 < p < 2 of an underlying n-dimensional vector presented as a sequence of additive updates in a stream. It is well-known that using p-stable distributions one can approximate any of these moments up to a multiplicative (1+epsilon)-factor using O(epsilon^{-2} log n) bits of space, and this space bound is optimal up to a constant factor in the turnstile streaming model. We show that surprisingly, if one instead considers the popular random-order model of insertion-only streams, in which the updates to the underlying vector arrive in a random order, then one can beat this space bound and achieve O~(epsilon^{-2} + log n) bits of space, where the O~ hides poly(log(1/epsilon) + log log n) factors. If epsilon^{-2} ~~ log n, this represents a roughly quadratic improvement in the space achievable in turnstile streams. Our algorithm is in fact deterministic, and we show our space bound is optimal up to poly(log(1/epsilon) + log log n) factors for deterministic algorithms in the random order model. We also obtain a similar improvement in space for p = 2 whenever F_2 >~ log n * F_1.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • Data Stream
  • Frequency Moments
  • Random Order
  • Space Complexity
  • Insertion Only Stream

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Phillip B Gibbons, Yossi Matias, and Mario Szegedy. Tracking join and self-join sizes in limited storage. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 10-20. ACM, 1999. Google Scholar
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29. ACM, 1996. Google Scholar
  3. Alexandr Andoni. High frequency moment via max stability. Unpublished manuscript, 2012. Google Scholar
  4. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 363-372. IEEE, 2011. Google Scholar
  5. Alexandr Andoni, Andrew McGregor, Krzysztof Onak, and Rina Panigrahy. Better bounds for frequency moments in random-order streams. arXiv preprint arXiv:0808.2222, 2008. Google Scholar
  6. Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, and David P Woodruff. Beating countsketch for heavy hitters in insertion streams. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 740-753. ACM, 2016. Google Scholar
  7. Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An optimal algorithm for large frequency moments using o (n^(1-2/k)) bits. In LIPIcs-Leibniz International Proceedings in Informatics, volume 28. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. Google Scholar
  8. Vladimir Braverman and Rafail Ostrovsky. Recursive sketching for frequency moments. arXiv preprint arXiv:1011.2571, 2010. Google Scholar
  9. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 641-650. ACM, 2008. Google Scholar
  10. Amit Chakrabarti, TS Jayram, and Mihai Patrascu. Tight lower bounds for selection in randomly ordered streams. In SODA, pages 720-729, 2008. Google Scholar
  11. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In Computational Complexity, 2003. Proceedings. 18th IEEE Annual Conference on, pages 107-117. IEEE, 2003. Google Scholar
  12. Michael Crouch, Andrew McGregor, Gregory Valiant, and David P Woodruff. Stochastic streams: Sample complexity vs. space complexity. In LIPIcs-Leibniz International Proceedings in Informatics, volume 57. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  13. Erik Demaine, Alejandro López-Ortiz, and J Munro. Frequency estimation of internet packet streams with limited space. Algorithms—ESA 2002, pages 11-20, 2002. Google Scholar
  14. Devdatt P Dubhashi and Alessandro Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009. Google Scholar
  15. Sudipto Guha and Zhiyi Huang. Revisiting the direct sum theorem and space lower bounds in random order streams. Automata, Languages and Programming, pages 513-524, 2009. Google Scholar
  16. Sudipto Guha and Andrew McGregor. Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing, 38(5):2044-2059, 2009. Google Scholar
  17. Godfrey Harold Hardy and Edward Maitland Wright. An introduction to the theory of numbers. Oxford University Press, 1979. Google Scholar
  18. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 189-197. IEEE, 2000. Google Scholar
  19. Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202-208. ACM, 2005. Google Scholar
  20. Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745-754. ACM, 2011. Google Scholar
  21. Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161-1178. SIAM, 2010. Google Scholar
  22. J Ian Munro and Mike S Paterson. Selection and sorting with limited storage. Theoretical computer science, 12(3):315-323, 1980. Google Scholar
  23. David Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 167-175. Society for Industrial and Applied Mathematics, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail