Zero-One Laws for Sliding Windows and Universal Sketches
Given a stream of data, a typical approach in streaming algorithms is to design a sophisticated algorithm with small memory that computes a specific statistic over the streaming data. Usually, if one wants to compute a different statistic after the stream is gone, it is impossible. But what if we want to compute a different statistic after the fact? In this paper, we consider the following fascinating possibility: can we collect some small amount of specific data during the stream that is "universal," i.e., where we do not know anything about the statistics we will want to later compute, other than the guarantee that had we known the statistic ahead of time, it would have been possible to do so with small memory? This is indeed what we introduce (and show) in this paper with matching upper and lower bounds: we show that it is possible to collect universal statistics of polylogarithmic size, and prove that these universal statistics allow us after the fact to compute all other statistics that are computable with similar amounts of memory. We show that this is indeed possible, both for the standard unbounded streaming model and the sliding window streaming model.
Streaming Algorithms
Universality
Sliding Windows
573-590
Regular Paper
Vladimir
Braverman
Vladimir Braverman
Rafail
Ostrovsky
Rafail Ostrovsky
Alan
Roytman
Alan Roytman
10.4230/LIPIcs.APPROX-RANDOM.2015.573
List of open problems in sublinear algorithms: Problem 20. URL: http://sublinear.info/20.
http://sublinear.info/20
List of open problems in sublinear algorithms: Problem 30. URL: http://sublinear.info/30.
http://sublinear.info/30
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC, 1996.
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. In FOCS, 2002.
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM. Springer, 2002.
L. Bhuvanagiri, S. Ganguly, D. Kesh, and C. Saha. Simpler algorithm for estimating frequency moments of data streams. In SODA, 2006.
V. Braverman, R. Gelles, and R. Ostrovsky. How to catch l₂-heavy-hitters on sliding windows. In COCOON. Springer, 2013.
V. Braverman and R. Ostrovsky. Smooth histograms for sliding windows. In FOCS, 2007.
V. Braverman and R. Ostrovsky. Zero-one frequency laws. In STOC, 2010.
V. Braverman and R. Ostrovsky. Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams. In APPROX-RANDOM. Springer, 2013.
A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In CCC, 2003.
D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. In SODA, 2004.
G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). IEEE TKDE, 15(3):529-540, 2003.
G. Cormode and S. Muthukrishnan. What’s hot and what’s not: tracking most frequent items dynamically. ACM TODS, 30(1):249-278, 2005.
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In SODA, 2002.
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate l₁-difference algorithm for massive data streams. In FOCS, 1999.
P. Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. JCSS, 31(2):182-209, 1985.
S. Ganguly and G. Cormode. On estimating frequency moments of data streams. In APPROX-RANDOM. Springer, 2007.
P. B. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. In SPAA, 2002.
L. Golab, D. DeHaan, E. D. Demaine, A. Lopez-Ortiz, and J. I. Munro. Identifying frequent items in sliding windows over on-line packet streams. In IMC, 2003.
R. Hung, L. K. Lee, and H. F. Ting. Finding frequent items over sliding windows with constant update time. IPL, 110(7):257-260, 2010.
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. JACM, 53(3):307-323, 2006.
P. Indyk and D. Woodruff. Tight lower bounds for the distinct elements problem. In FOCS, 2003.
P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In STOC, 2005.
C. Jin, W. Qian, C. Sha, J. X. Yu, and A. Zhou. Dynamically maintaining frequent items over a data stream. In CIKM, 2003.
D. M. Kane, J. Nelson, and D. Woodruff. On the exact space complexity of sketching and streaming small norms. In SODA, 2010.
D. M. Kane, J. Nelson, and D. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, 2010.
Y. Li, H. Nguy\sbox0ê\sbox2~\ooalign\hidewidthåise\dimexpr\ht0-\ht2+.3ex\box2 \hidewidthŗêŗn, and D. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In STOC, 2014.
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
G. Nie and Z. Lu. Approximate frequency counts in sliding window over data stream. In CCECE, 2005.
D. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, 2004.
L. Zhang and Y. Guan. Frequency estimation over sliding windows. In ICDE, 2008.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode