An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits

Authors Vladimir Braverman, Jonathan Katzman, Charles Seidell, Gregory Vorsanger



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2014.531.pdf
  • Filesize: 0.5 MB
  • 14 pages

Document Identifiers

Author Details

Vladimir Braverman
Jonathan Katzman
Charles Seidell
Gregory Vorsanger

Cite As Get BibTex

Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 28, pp. 531-544, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.531

Abstract

In this paper, we provide the first optimal algorithm for the remaining open question from the seminal paper of Alon, Matias, and Szegedy: approximating large frequency moments. We give an upper bound on the space required to find a k-th frequency moment of O(n^(1-2/k)) bits that matches, up to a constant factor, the lower bound of Woodruff et. al for constant epsilon and constant k.
Our algorithm makes a single pass over the stream and works for any constant k > 3. It is based upon two major technical accomplishments: first, we provide an optimal algorithm for finding the heavy elements in a stream; and second, we provide a technique using Martingale Sketches which gives an optimal reduction of the large frequency moment problem to the all heavy elements problem. We also provide a polylogarithmic improvement for frequency moments, frequency based functions, spatial data streams, and measuring independence of data sets.

Subject Classification

Keywords
  • Streaming Algorithms
  • Randomized Algorithms
  • Frequency Moments
  • Heavy Hitters

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. Google Scholar
  2. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS'11, pages 363-372, Washington, DC, USA, 2011. IEEE Computer Society. Google Scholar
  3. Alexandr Andoni, Huy L. Nguyen, Yury Polyanskiy, and Yihong Wu. Tight lower bound for linear sketches of moments. In Proceedings of the 40th International Conference on Automata, Languages, and Programming - Volume Part I, ICALP'13, pages 25-32, Berlin, Heidelberg, 2013. Springer-Verlag. Google Scholar
  4. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. In Proceedings of the 43rd Symposium on Foundations of Computer Science, FOCS'02, pages 209-218, Washington, DC, USA, 2002. IEEE Computer Society. Google Scholar
  5. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In Proceedings of the 6th International Workshop on Randomization and Approximation Techniques, RANDOM'02, pages 1-10, London, UK, UK, 2002. Springer-Verlag. Google Scholar
  6. Paul Beame, T. S. Jayram, and Atri Rudra. Lower bounds for randomized read/write stream algorithms. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, STOC'07, pages 689-698, New York, NY, USA, 2007. ACM. Google Scholar
  7. Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha. Simpler algorithm for estimating frequency moments of data streams. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, SODA'06, pages 708-713, New York, NY, USA, 2006. ACM. Google Scholar
  8. Vladimir Braverman, Ran Gelles, and Rafail Ostrovsky. How to catch l 2-heavy-hitters on sliding windows. In Ding-Zhu Du and Guochuan Zhang, editors, Computing and Combinatorics, volume 7936 of Lecture Notes in Computer Science, pages 638-650. Springer Berlin Heidelberg, 2013. Google Scholar
  9. Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. Approximating Large Frequency Moments with O(n^1-2/k) Bits. CoRR, abs/1401.1763, 2014. Google Scholar
  10. Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS'07, pages 283-293, Washington, DC, USA, 2007. IEEE Computer Society. Google Scholar
  11. Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM J. Comput., 39(6):2113-2131, March 2010. Google Scholar
  12. Vladimir Braverman and Rafail Ostrovsky. Measuring independence of datasets. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC'10, pages 271-280, New York, NY, USA, 2010. ACM. Google Scholar
  13. Vladimir Braverman and Rafail Ostrovsky. Recursive sketching for frequency moments. CoRR, abs/1011.2571, 2010. Google Scholar
  14. Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC'10, pages 281-290, New York, NY, USA, 2010. ACM. Google Scholar
  15. Vladimir Braverman and Rafail Ostrovsky. Approximating large frequency moments with pick-and-drop sampling. Accepted to the 16th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX'2013)., 2013. Google Scholar
  16. Vladimir Braverman and Rafail Ostrovsky. Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams. Accepted to the 16th. International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX'2013)., 2013. Google Scholar
  17. Vladimir Braverman, Rafail Ostrovsky, and Carlo Zaniolo. Optimal sampling from sliding windows. J. Comput. Syst. Sci., 78(1):260-272, January 2012. Google Scholar
  18. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. Robust lower bounds for communication and stream computation. In Proceedings of the 40th annual ACM symposium on Theory of computing, STOC'08, pages 641-650, New York, NY, USA, 2008. ACM. Google Scholar
  19. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In IEEE Conference on Computational Complexity, pages 107-117, 2003. Google Scholar
  20. Don Coppersmith and Ravi Kumar. An improved data stream algorithm for frequency moments. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, SODA'04, pages 151-156, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. Google Scholar
  21. Graham Cormode. Continuous distributed monitoring: a short survey. In Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing, AlMoDEP'11, pages 1-10, New York, NY, USA, 2011. ACM. Google Scholar
  22. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE Trans. on Knowl. and Data Eng., 15(3):529-540, 2003. Google Scholar
  23. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6):1794-1813, 2002. Google Scholar
  24. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate l1-difference algorithm for massive data streams. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS'99, pages 501-, Washington, DC, USA, 1999. IEEE Computer Society. Google Scholar
  25. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, September 1985. Google Scholar
  26. Sumit Ganguly. Estimating frequency moments of data streams using random linear combinations. In APPROX-RANDOM, pages 369-380, 2004. Google Scholar
  27. Sumit Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011. Google Scholar
  28. Sumit Ganguly. A lower bound for estimating high moments of a data stream. CoRR, abs/1201.0253, 2012. Google Scholar
  29. Sumit Ganguly and Graham Cormode. On estimating frequency moments of data streams. In Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX'07/RANDOM'07, pages 479-493, Berlin, Heidelberg, 2007. Springer-Verlag. Google Scholar
  30. Phillip B. Gibbons and Srikanta Tirthapura. Distributed streams algorithms for sliding windows. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, SPAA'02, pages 63-72, New York, NY, USA, 2002. ACM. Google Scholar
  31. Andre Gronemeier. Asymptotically optimal lower bounds on the nih-multi-party information. CoRR, abs/0902.1609, 2009. Google Scholar
  32. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. Google Scholar
  33. Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In STOC'05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202-208, New York, NY, USA, 2005. ACM. Google Scholar
  34. T. S. Jayram, Andrew McGregor, S. Muthukrishnan, and Erik Vee. Estimating statistical aggregates on probabilistic data streams. In PODS'07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 243-252, New York, NY, USA, 2007. ACM. Google Scholar
  35. T. S. Jayram and David Woodruff. Optimal bounds for johnson-lindenstrauss transforms and streaming problems with sub-constant error. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'11, pages 1-10. SIAM, 2011. Google Scholar
  36. T.S. Jayram. Hellinger strikes back: A note on the multi-party information complexity of and. In Irit Dinur, Klaus Jansen, Joseph Naor, and José Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, volume 5687 of Lecture Notes in Computer Science, pages 562-573. Springer Berlin Heidelberg, 2009. Google Scholar
  37. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'10, pages 1161-1178, Philadelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics. Google Scholar
  38. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS'10, pages 41-52, New York, NY, USA, 2010. ACM. Google Scholar
  39. Ping Li. Compressed counting. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'09, pages 412-421, Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics. Google Scholar
  40. Yi Li and DavidP. Woodruff. A tight lower bound for high frequency moment estimation with small error. In Prasad Raghavendra, Sofya Raskhodnikova, Klaus Jansen, and JoséD.P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, volume 8096 of Lecture Notes in Computer Science, pages 623-638. Springer Berlin Heidelberg, 2013. Google Scholar
  41. Morteza Monemizadeh and David P. Woodruff. 1passs relative-error lp-sampling with applications. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'10, pages 1143-1160, Philadelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics. Google Scholar
  42. S. Muthukrishnan. Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci., 1(2):117-236, 2005. Google Scholar
  43. Jelani Nelson and David P. Woodruff. Fast Manhattan sketches in data streams. In PODS'10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data, pages 99-110, New York, NY, USA, 2010. ACM. Google Scholar
  44. Eric Price and David P. Woodruff. Applications of the shannon-hartley theorem to data streams and sparse recovery. In ISIT, pages 2446-2450, 2012. Google Scholar
  45. J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, v.11 n.1, pp.37-57, 1985. Google Scholar
  46. David Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, SODA'04, pages 167-175, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. Google Scholar
  47. David P. Woodruff. Frequency moments. In Encyclopedia of Database Systems, pages 1169-1170. Springer, 2009. Google Scholar
  48. David P. Woodruff and Qin Zhang. Tight bounds for distributed functional monitoring. In Proceedings of the 44th symposium on Theory of Computing, STOC'12, pages 941-960, New York, NY, USA, 2012. ACM. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail