Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem

Authors Seth Pettie , Dingyu Wang



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2025.77.pdf
  • Filesize: 1.09 MB
  • 23 pages

Document Identifiers

Author Details

Seth Pettie
  • University of Michigan, Ann Arbor, MI, USA
Dingyu Wang
  • University of Michigan, Ann Arbor, MI, USA

Cite As Get BibTex

Seth Pettie and Dingyu Wang. Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 77:1-77:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ITCS.2025.77

Abstract

In the d-dimensional turnstile streaming model, a frequency vector 𝐱 = (𝐱(1),…,𝐱(n)) ∈ (ℝ^d)ⁿ is updated entry-wisely over a stream. We consider the problem of f-moment estimation for which one wants to estimate f(𝐱)=∑_{v ∈ [n]}f(𝐱(v)) with a small-space sketch. A function f is tractable if the f-moment can be estimated to within a constant factor using polylog(n) space. 
The f-moment estimation problem has been intensively studied in the d = 1 case. Flajolet and Martin estimate the F₀-moment (f(x) = 1 (x > 0), incremental stream); Alon, Matias, and Szegedy estimate the L₂-moment (f(x) = x²); Indyk estimates the L_α-moment (f(x) = |x|^α), α ∈ (0,2]. For d ≥ 2, Ganguly, Bansal, and Dube estimate the L_{p,q} hybrid moment (f:ℝ^d → ℝ,f(x) = (∑_{j = 1}^d |x_j|^p)^q), p ∈ (0,2],q ∈ (0,1). For tractability, Bar-Yossef, Jayram, Kumar, and Sivakumar show that f(x) = |x|^α is not tractable for α > 2. Braverman, Chestnut, Woodruff, and Yang characterize the class of tractable one-variable functions except for a class of nearly periodic functions.
In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to Lévy processes, from which one can estimate the f-moment f(𝐱) where f is the characteristic exponent of the Lévy process. The fundamental Lévy-Khintchine representation theorem completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of f-moments that can be estimated by this generic scheme. 
The new scheme has strong explanatory power. It unifies the construction of many existing sketches (F₀, L₀, L₂, L_α, L_{p,q}, etc.) and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases (d ≥ 2) by considering multidimensional Lévy processes and can be further generalized to estimate heterogeneous moments by projecting different indices with different Lévy processes. We conjecture that the set of tractable functions can be characterized using the Lévy-Khintchine representation theorem via what we called the Fourier-Hahn-Lévy method.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • Streaming Sketches
  • Lévy Processes

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: https://doi.org/10.1006/jcss.1997.1545.
  2. Ziv Bar-Yossef, Thathachar S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702-732, 2004. URL: https://doi.org/10.1016/J.JCSS.2003.11.006.
  3. Vladimir Braverman and Stephen R Chestnut. Universal sketches for the frequency negative moments and other decreasing streaming sums. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, page 591, 2015. Google Scholar
  4. Vladimir Braverman, Stephen R. Chestnut, David P. Woodruff, and Lin F. Yang. Streaming space complexity of nearly all functions of one variable on frequency vectors. In Proceedings 35th ACM Symposium on Principles of Database Systems (PODS), pages 261-276, 2016. URL: https://doi.org/10.1145/2902251.2902282.
  5. Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An optimal algorithm for large frequency moments using O(n^1-2/k) bits. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Schloss-Dagstuhl - Leibniz Zentrum für Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.531.
  6. Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. In Proceedings 42nd ACM Symposium on Theory of Computing (STOC), pages 281-290, 2010. URL: https://doi.org/10.1145/1806689.1806729.
  7. Vladimir Braverman and Rafail Ostrovsky. Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams. In Proceedings 16th International Workshop on Approximation, Randomization, and Combinatorial Optimization (APPROX), volume 8096 of Lecture Notes in Computer Science, pages 58-70. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40328-6_5.
  8. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings., pages 107-117. IEEE, 2003. URL: https://doi.org/10.1109/CCC.2003.1214414.
  9. Stephen Robert Chestnut. Stream sketches, sampling, and sabotage. PhD thesis, Johns Hopkins University, 2015. Google Scholar
  10. Graham Cormode, Mayur Datar, Piotr Indyk, and Shanmugavelayutham Muthukrishnan. Comparing data streams using Hamming norms (how to zero in). IEEE Transactions on Knowledge and Data Engineering, 15(3):529-540, 2003. URL: https://doi.org/10.1109/TKDE.2003.1198388.
  11. Graham Cormode and Donatella Firmani. A unifying framework for 𝓁₀-sampling algorithms. Distributed Parallel Databases, 32(3):315-335, 2014. URL: https://doi.org/10.1007/S10619-013-7131-9.
  12. Rick Durrett. Probability: theory and examples. Cambridge University Press, 2019. Google Scholar
  13. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In Proceedings of the 18th International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA), pages 127-146, 2007. Google Scholar
  14. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182-209, 1985. URL: https://doi.org/10.1016/0022-0000(85)90041-8.
  15. Sumit Ganguly. Estimating frequency moments of data streams using random linear combinations. In Proceedings 8th International Workshop on Randomization and Computation (RANDOM), volume 3122 of Lecture Notes in Computer Science, pages 369-380, 2004. URL: https://doi.org/10.1007/978-3-540-27821-4_33.
  16. Sumit Ganguly, Mohit Bansal, and Shruti Dube. Estimating hybrid frequency moments of data streams. Journal of Combinatorial Optimization, 23(3):373-394, 2012. URL: https://doi.org/10.1007/S10878-010-9339-1.
  17. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, 2006. URL: https://doi.org/10.1145/1147954.1147955.
  18. Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202-208, 2005. URL: https://doi.org/10.1145/1060590.1060621.
  19. T. S. Jayram and David P. Woodruff. The data stream space complexity of cascaded norms. In Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS '09, pages 765-774, USA, 2009. IEEE Computer Society. URL: https://doi.org/10.1109/FOCS.2009.82.
  20. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings 29th ACM Symposium on Principles of Database Systems (PODS), pages 41-52, 2010. URL: https://doi.org/10.1145/1807085.1807094.
  21. Kevin J. Lang. Back to the future: an even more nearly optimal cardinality estimation algorithm. CoRR, abs/1708.06839, 2017. URL: https://arxiv.org/abs/1708.06839.
  22. Ping Li. Estimators and tail bounds for dimension reduction in l_α (0 < α ≤ 2) using stable random projections. In Proceedings 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 10-19, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347084.
  23. Yi Li, Huy L. Nguyen, and David P. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In Proceedings 46th Annual ACM Symposium on Theory of Computing (STOC), pages 174-183, 2014. URL: https://doi.org/10.1145/2591796.2591812.
  24. Yi Li and David P. Woodruff. Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings. In Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris Umans, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016), volume 60 of Leibniz International Proceedings in Informatics (LIPIcs), pages 39:1-39:11, Dagstuhl, Germany, 2016. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2016.39.
  25. Robert Morris. Counting large number of events in small registers. Communications of the ACM, 21(10):840-842, 1978. URL: https://doi.org/10.1145/359619.359627.
  26. J. Ian Munro and Mike Paterson. Selection and sorting with limited storage. Theor. Comput. Sci., 12:315-323, 1980. URL: https://doi.org/10.1016/0304-3975(80)90061-4.
  27. Seth Pettie and Dingyu Wang. Information theoretic limits of cardinality estimation: Fisher meets Shannon. In Proceedings 53rd Annual ACM Symposium on Theory of Computing (STOC), pages 556-569, 2021. URL: https://doi.org/10.1145/3406325.3451032.
  28. Eric Price and David P Woodruff. Applications of the Shannon-Hartley theorem to data streams and sparse recovery. In 2012 IEEE International Symposium on Information Theory Proceedings, pages 2446-2450. IEEE, 2012. URL: https://doi.org/10.1109/ISIT.2012.6283954.
  29. Ken-Iti Sato. Lévy processes and infinitely divisible distributions. Cambridge University Press, 1999. Google Scholar
  30. Dingyu Wang and Seth Pettie. Better cardinality estimators for HyperLogLog, PCSA, and beyond. In Proceedings 42nd ACM Symposium on Principles of Database Systems (PODS), pages 317-327, 2023. URL: https://doi.org/10.1145/3584372.3588680.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail