New Verification Schemes for Frequency-Based Functions on Data Streams

Author Prantar Ghosh



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2020.22.pdf
  • Filesize: 485 kB
  • 15 pages

Document Identifiers

Author Details

Prantar Ghosh
  • Dartmouth College, Hanover, NH, USA

Acknowledgements

The author would like to thank Amit Chakrabarti and Justin Thaler for several helpful discussions. He is also grateful to the anonymous FSTTCS 2020 reviewers for their valuable comments, especially to the reviewer who gave a sketch of a scheme using a randomized estimation algorithm, pointing out that it can handle longer streams and that determinism isn't strictly necessary for the subroutine.

Cite AsGet BibTex

Prantar Ghosh. New Verification Schemes for Frequency-Based Functions on Data Streams. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.FSTTCS.2020.22

Abstract

We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements (F₀), frequency moments (F_k), and heavy-hitters. It can also be applied to calculate the maximum frequency of an element (F_{∞}). Given that exact computation of most of these special cases provably do not admit any sublinear space algorithm, a natural approach is to consider them in an enhanced data streaming model, where we have a computationally unbounded but untrusted prover that can send proofs or help messages to ease the computation. Think of a memory-restricted client delegating the computation to a powerful cloud service. The client does not blindly trust the cloud, and with its limited memory, it wants to verify the proof that the cloud sends. Chakrabarti et al. (ICALP '09) introduced this model as the annotated data streaming model and showed that multiple problems including exact computation of frequency-based functions - that have no sublinear algorithms in basic streaming - do have algorithms, also called schemes, in the annotated streaming model with both space and proof-length sublinear in the input size. We give a general scheme for computing any frequency-based function with both space usage and proof-size of O(n^{2/3}log n) bits, where n is the size of the universe. This improves upon the best known bound of O(n^{2/3}log^{4/3} n) given by the seminal paper of Chakrabarti et al. and as a result, also improves upon the best known bounds for the important special cases of computing F₀ and F_{∞}. We emphasize that while being quantitatively better, our scheme is also qualitatively better in the sense that it is simpler than the previously best scheme that uses intricate data structures and elaborate subroutines. Our scheme uses a simple technique tailored for this model: the verifier solves the problem partially by running an algorithm known to be helpful for it in the basic (sans prover) streaming model and then takes the prover’s help to solve the remaining part.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming models
  • Theory of computation → Interactive proof systems
  • Computer systems organization → Cloud computing
Keywords
  • data streams
  • interactive proofs
  • Arthur-Merlin

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amirali Abdullah, Samira Daruki, Chitradeep Dutta Roy, and Suresh Venkatasubramanian. Streaming verification of graph properties. In Proc. 27th International Symposium on Algorithms and Computation, pages 3:1-3:14, 2016. Google Scholar
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. Preliminary version in Proc. 28th Annual ACM Symposium on the Theory of Computing , pages 20-29, 1996. Google Scholar
  3. Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501-555, 1998. Preliminary version in Proc. 33rd Annual IEEE Symposium on Foundations of Computer Science , pages 14-23, 1992. Google Scholar
  4. Sanjeev Arora and Shmuel Safra. Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70-122, 1998. Preliminary version in Proc. 33rd Annual IEEE Symposium on Foundations of Computer Science , pages 2-13, 1992. Google Scholar
  5. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In Proc. 6th International Workshop on Randomization and Approximation Techniques in Computer Science, pages 128-137, 2002. Google Scholar
  6. Amit Chakrabarti, Graham Cormode, Navin Goyal, and Justin Thaler. Annotations for sparse data streams. In Proc. 25th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 687-706, 2014. Google Scholar
  7. Amit Chakrabarti, Graham Cormode, Andrew McGregor, and Justin Thaler. Annotations in data streams. ACM Trans. Alg., 11(1):Article 7, 2014. Google Scholar
  8. Amit Chakrabarti, Graham Cormode, Andrew McGregor, Justin Thaler, and Suresh Venkatasubramanian. Verifiable stream computation and Arthur-Merlin communication. In Proc. 30th Annual IEEE Conference on Computational Complexity, pages 217-243, 2015. Google Scholar
  9. Amit Chakrabarti and Prantar Ghosh. Streaming verification of graph computations via graph structure. In Proc. 33rd International Workshop on Randomization and Approximation Techniques in Computer Science, pages 70:1-70:20, 2019. Google Scholar
  10. Amit Chakrabarti, Prantar Ghosh, and Justin Thaler. Streaming verification for graph problems: Optimal tradeoffs and nonlinear sketches. To appear in RANDOM, 2020. URL: http://arxiv.org/abs/2007.03039.
  11. Graham Cormode, Michael Mitzenmacher, and Justin Thaler. Streaming graph computations with a helpful advisor. In Proc. 18th Annual European Symposium on Algorithms, pages 231-242, 2010. Google Scholar
  12. Graham Cormode, Michael Mitzenmacher, and Justin Thaler. Streaming graph computations with a helpful advisor. Algorithmica, 65(2):409-442, 2013. Google Scholar
  13. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Alg., 55(1):58-75, 2005. Preliminary version in Proc. 6th Latin American Theoretical Informatics Symposium , pages 29-38, 2004. Google Scholar
  14. Graham Cormode, Justin Thaler, and Ke Yi. Verifying computations with streaming interactive proofs. Proc. VLDB Endowment, 5(1):25-36, 2011. Google Scholar
  15. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, 1985. Google Scholar
  16. Sumit Ganguly and Graham Cormode. On estimating frequency moments of data streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 10th International Workshop, APPROX 2007, and 11th International Workshop, RANDOM 2007, volume 4627 of Lecture Notes in Computer Science, pages 479-493, 2007. Google Scholar
  17. Tom Gur and Ran Raz. Arthur-Merlin streaming complexity. In Proc. 40th International Colloquium on Automata, Languages and Programming, pages 528-539, 2013. Google Scholar
  18. Piotr Indyk and David P. Woodruff. Optimal approximations of the frequency moments of data streams. In Proc. 37th Annual ACM Symposium on the Theory of Computing, pages 202-208, 2005. Google Scholar
  19. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1161-1178, 2010. Google Scholar
  20. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proc. 29th ACM Symposium on Principles of Database Systems, pages 41-52, 2010. Google Scholar
  21. Hartmut Klauck and Ved Prakash. Streaming computations with a loquacious prover. In Proc. 4th Conference on Innovations in Theoretical Computer Science, pages 305-320, 2013. Google Scholar
  22. Hartmut Klauck and Ved Prakash. An improved interactive streaming algorithm for the distinct elements problem. In Automata, Languages, and Programming - 41st International Colloquium (ICALP), volume 8572 of LNCS, pages 919-930, 2014. Google Scholar
  23. Feifei Li, Ke Yi, Marios Hadjieleftheriou, and George Kollios. Proof-infused streams: Enabling authentication of sliding window queries on streams. In Proc. 33rd International Conference on Very Large Data Bases, pages 147-158, 2007. Google Scholar
  24. Carsten Lund, Lance Fortnow, Howard J. Karloff, and Noam Nisan. Algebraic methods for interactive proof systems. J. ACM, 39(4):859-868, 1992. Google Scholar
  25. Jayadev Misra and David Gries. Finding repeated elements. Sci. Comput. Program., 2(2):143-152, 1982. Google Scholar
  26. Jelani Nelson, Huy L. Nguyên, and David P. Woodruff. On deterministic sketching and streaming for sparse recovery and norm estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, volume 7408 of Lecture Notes in Computer Science, pages 627-638, 2012. Google Scholar
  27. Stavros Papadopoulos, Yin Yang, and Dimitris Papadias. Cads: Continuous authentication on data streams. In Proc. 33rd International Conference on Very Large Data Bases, pages 135-146, 2007. Google Scholar
  28. Adi Shamir. IP = PSPACE. J. ACM, 39(4):869-877, 1992. Google Scholar
  29. Justin Thaler. Data stream verification. In Encyclopedia of Algorithms, pages 494-499. Springer Berlin Heidelberg, 2016. Google Scholar
  30. Justin Thaler. Semi-streaming algorithms for annotated graph streams. In Proc. 43rd International Colloquium on Automata, Languages and Programming, pages 59:1-59:14, 2016. Google Scholar
  31. Peter A. Tucker, David Maier, Lois M. L. Delcambre, Tim Sheard, Jennifer Widom, and Mark P. Jones. Punctuated data streams, 2005. Google Scholar
  32. David P. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 167-175, 2004. Google Scholar
  33. Ke Yi, Feifei Li, Marios Hadjieleftheriou, George Kollios, and Divesh Srivastava. Randomized synopses for query assurance on data streams. In Proc. 24th International Conference on Data Engineering, pages 416-425, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail