Deterministic Heavy Hitters with Sublinear Query Time

Authors Yi Li, Vasileios Nakos



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2018.18.pdf
  • Filesize: 0.53 MB
  • 18 pages

Document Identifiers

Author Details

Yi Li
  • Nanyang Technological University, Singapore
Vasileios Nakos
  • Harvard University, USA

Cite AsGet BibTex

Yi Li and Vasileios Nakos. Deterministic Heavy Hitters with Sublinear Query Time. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.18

Abstract

We study the classic problem of finding l_1 heavy hitters in the streaming model. In the general turnstile model, we give the first deterministic sublinear-time sketching algorithm which takes a linear sketch of length O(epsilon^{-2} log n * log^*(epsilon^{-1})), which is only a factor of log^*(epsilon^{-1}) more than the best existing polynomial-time sketching algorithm (Nelson et al., RANDOM '12). Our approach is based on an iterative procedure, where most unrecovered heavy hitters are identified in each iteration. Although this technique has been extensively employed in the related problem of sparse recovery, this is the first time, to the best of our knowledge, that it has been used in the context of heavy hitters. Along the way we also obtain a sublinear time algorithm for the closely related problem of the l_1/l_1 compressed sensing, matching the space usage of previous (super-)linear time algorithms. In the strict turnstile model, we show that the runtime can be improved and the sketching matrix can be made strongly explicit with O(epsilon^{-2}log^3 n/log^3(1/epsilon)) rows.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • heavy hitters
  • turnstile model
  • sketching algorithm
  • strongly explicit

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In Proceedings of the 2011 IEEE 52Nd Annual Symposium on Foundations of Computer Science, FOCS '11, pages 363-372, Washington, DC, USA, 2011. IEEE Computer Society. Google Scholar
  2. R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff, and M. J. Strauss. Combining geometry and combinatorics: A unified approach to sparse signal recovery. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pages 798-805, Sept 2008. URL: http://dx.doi.org/10.1109/ALLERTON.2008.4797639.
  3. Radu Berinde, Piotr Indyk, Graham Cormode, and Martin J. Strauss. Space-optimal heavy hitters with strong error bounds. ACM Trans. Database Syst., 35(4):26:1-26:28, 2010. URL: http://dx.doi.org/10.1145/1862919.1862923.
  4. Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best k-term approximation. Journal of the American mathematical society, 22(1):211-231, 2009. Google Scholar
  5. Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, 2005. Google Scholar
  6. S Ben Fred, Thomas Bonald, Alexandre Proutiere, Gwénaël Régnié, and James W Roberts. Statistical bandwidth sharing: a study of congestion at flow level. In ACM SIGCOMM Computer Communication Review, volume 31, pages 111-122. ACM, 2001. Google Scholar
  7. Sumit Ganguly. Lower bounds on frequency estimation of data streams. Lecture Notes in Computer Science, 5010:204-215, 2008. Google Scholar
  8. A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: Fast algorithms for compressed sensing. In Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, STOC '07, pages 237-246, New York, NY, USA, 2007. ACM. Google Scholar
  9. Anna C Gilbert, Yi Li, Ely Porat, and Martin J Strauss. Approximate sparse recovery: optimizing time and measurements. SIAM Journal on Computing, 41(2):436-453, 2012. Google Scholar
  10. Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss. For-all sparse recovery in near-optimal time. ACM Trans. Algorithms, 13(3):32:1-32:26, 2017. Google Scholar
  11. Nicholas J. A. Harvey, Jelani Nelson, and Krzysztof Onak. Sketching and streaming entropy via approximation theory. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS '08, pages 489-498, Washington, DC, USA, 2008. IEEE Computer Society. Google Scholar
  12. P. Indyk and M. Ruzic. Near-optimal sparse recovery in the l1 norm. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, pages 199-207, Oct 2008. URL: http://dx.doi.org/10.1109/FOCS.2008.82.
  13. T. S. Jayram and D. P. Woodruff. The data stream space complexity of cascaded norms. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 765-774, Oct 2009. Google Scholar
  14. Hossein Jowhari, Mert Sağlam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '11, pages 49-58, New York, NY, USA, 2011. ACM. Google Scholar
  15. Daniel M. Kane and Jelani Nelson. Sparser johnson-lindenstrauss transforms. J. ACM, 61(1):4:1-4:23, 2014. URL: http://dx.doi.org/10.1145/2559902.
  16. Kasper Green Larsen, Jelani Nelson, Huy L Nguyen, and Mikkel Thorup. Heavy hitters via cluster-preserving clustering. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 61-70. IEEE, 2016. Google Scholar
  17. Yi Li, Huy L. Nguyen, and David P. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In Proceedings of the Forty-sixth Annual ACM Symposium on Theory of Computing, STOC '14, pages 174-183, New York, NY, USA, 2014. ACM. URL: http://dx.doi.org/10.1145/2591796.2591812.
  18. Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, VLDB '02, pages 346-357. VLDB Endowment, 2002. URL: http://dl.acm.org/citation.cfm?id=1287369.1287400.
  19. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams, pages 398-412. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005. Google Scholar
  20. Morteza Monemizadeh and David P. Woodruff. 1passs relative-error lp-sampling with applications. In Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '10, pages 1143-1160, Philadelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics. Google Scholar
  21. Tatsuya Mori, Ryoichi Kawahara, Shozo Naito, and Shigeki Goto. On the characteristics of internet traffic variability: Spikes and elephants. IEICE TRANSACTIONS on Information and Systems, 87:2644-2653, 2004. Google Scholar
  22. Jelani Nelson, Huy L. Nguyên, and David P. Woodruff. On deterministic sketching and streaming for sparse recovery and norm estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings, pages 627-638, 2012. Google Scholar
  23. Konstantina Papagiannaki, Nina Taft, S Bhattacharya, Patrick Thiran, Kave Salamatian, and Christophe Diot. On the feasibility of identifying elephants in internet backbone traffic. Sprint Labs, Sprint ATL, Tech. Rep. TR01-ATL-110918, 2001. Google Scholar
  24. Mark S. Pinsker. On the complexity of a concentrator. In Proceedings of 7th International Teletraffic Conference, 1973. Google Scholar
  25. Ely Porat and Martin J. Strauss. Sublinear time, measurement-optimal, sparse recovery for all. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '12, pages 1215-1227, 2012. Google Scholar
  26. Liang Yang, Bryan Ng, and Winston KG Seah. Heavy hitter detection and identification in software defined networking. In Computer Communication and Networks (ICCCN), 2016 25th International Conference on, pages 1-10. IEEE, 2016. Google Scholar
  27. Yin Zhang, Lee Breslau, Vern Paxson, and Scott Shenker. On the characteristics and origins of internet flow rates. In ACM SIGCOMM Computer Communication Review, volume 32, pages 309-322. ACM, 2002. Google Scholar
  28. Yu Zhang, BinXing Fang, and YongZheng Zhang. Identifying heavy hitters in high-speed network monitoring. Science China Information Sciences, 53(3):659-676, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail