Tracking the l_2 Norm with Constant Update Time
The l_2 tracking problem is the task of obtaining a streaming algorithm that, given access to a stream of items a_1,a_2,a_3,... from a universe [n], outputs at each time t an estimate to the l_2 norm of the frequency vector f^{(t)}in R^n (where f^{(t)}_i is the number of occurrences of item i in the stream up to time t). The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, PODS 2017] gave a streaming algorithm with (the optimal) space using O(epsilon^{-2}log(1/delta)) words and O(epsilon^{-2}log(1/delta)) update time to obtain an epsilon-accurate estimate with probability at least 1-delta. We give the first algorithm that achieves update time of O(log 1/delta) which is independent of the accuracy parameter epsilon, together with the nearly optimal space using O(epsilon^{-2}log(1/delta)) words. Our algorithm is obtained using the Count Sketch of [Charilkar-Chen-Farach-Colton, ICALP 2002].
Streaming algorithms
Sketching algorithms
Tracking
CountSketch
Theory of computation~Sketching and sampling
2:1-2:15
APPROX
A full version of the paper is available at https://arxiv.org/abs/1807.06479.
The authors wish to thank Jelani Nelson for invaluable advice throughout the course of this research. We also thank Mitali Bafna and Jarosław Błasiok for useful discussion and thank Boaz Barak for many helpful comments on an earlier draft of this article. We are also grateful to reviewers' comments.
Chi-Ning
Chou
Chi-Ning Chou
School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA
http://cnchou.github.io
Supported by NSF awards CCF 1565264 and CNS 1618026.
Zhixian
Lei
Zhixian Lei
School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA
Supported by NSF awards CCF 1565264 and CNS 1618026.
Preetum
Nakkiran
Preetum Nakkiran
School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA
http://preetum.nakkiran.org
Work supported in part by a Simons Investigator Award, NSF Awards CCF 1565641 and CCF 1715187, and the NSF Graduate Research Fellowship Grant No. DGE1144152.
10.4230/LIPIcs.APPROX-RANDOM.2019.2
Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29. ACM, 1996.
Andrew C Berry. The accuracy of the Gaussian approximation to the sum of independent variates. Transactions of the american mathematical society, 49(1):122-136, 1941.
Jaroslaw Blasiok, Jian Ding, and Jelani Nelson. Continuous Monitoring of l_p Norms in Data Streams. In LIPIcs-Leibniz International Proceedings in Informatics, volume 81. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P Woodruff. BPTree: An 𝓁₂ heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 361-376. ACM, 2017.
Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, and David P Woodruff. Beating CountSketch for heavy hitters in insertion streams. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 740-753. ACM, 2016.
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693-703. Springer, 2002.
Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, 2005.
Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson-Lindenstrauss transform. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 341-350. ACM, 2010.
Carl-Gustaf Esseen. On the Liapounoff limit of error in the theory of probability. Almqvist & Wiksell Stockholm, 1942.
David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079-1083, 1971.
Zengfeng Huang, Wai Ming Tai, and Ke Yi. Tracking the Frequency Moments at All Times. arXiv preprint, 2014. URL: http://arxiv.org/abs/1412.1763.
http://arxiv.org/abs/1412.1763
Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307-323, 2006.
Tadeusz Inglot and Teresa Ledwina. Asymptotic optimality of new adaptive test in regression model. Annales de l'Institut Henri Poincare (B) Probability and Statistics, 42(5):579-590, 2006.
T. S. Jayram and David P. Woodruff. Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error. ACM Trans. Algorithms, 9(3):26:1-26:17, June 2013. URL: https://doi.org/10.1145/2483699.2483706.
https://doi.org/10.1145/2483699.2483706
Daniel Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 628-639. Springer, 2011.
Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM (JACM), 61(1):4, 2014.
Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745-754. ACM, 2011.
Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745-754. ACM, 2011.
Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 41-52. ACM, 2010.
Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161-1178. Society for Industrial and Applied Mathematics, 2010.
Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, and Yan Chen. Sketch-based change detection: methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 234-247. ACM, 2003.
Kasper Green Larsen, Jelani Nelson, and Huy L Nguyên. Time lower bounds for nonadaptive turnstile streaming algorithms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 803-812. ACM, 2015.
Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In SODA, volume 4, pages 615-624, 2004.
Mikkel Thorup and Yin Zhang. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM Journal on Computing, 41(2):293-331, 2012.
Chi-Ning Chou, Zhixian Lei, and Preetum Nakkiran
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode