A Dyadic Simulation Approach to Efficient Range-Summability

Authors Jingfan Meng, Huayi Wang, Jun Xu, Mitsunori Ogihara



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.17.pdf
  • Filesize: 0.68 MB
  • 18 pages

Document Identifiers

Author Details

Jingfan Meng
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Huayi Wang
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Jun Xu
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Mitsunori Ogihara
  • Department of Computer Science, University of Miami, Coral Gables, FL, USA

Cite As Get BibTex

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. A Dyadic Simulation Approach to Efficient Range-Summability. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICDT.2022.17

Abstract

Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Mathematics of computing → Random number generation
Keywords
  • fast range-summation
  • locality-sensitive hashing
  • rejection sampling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC '96, pages 20-29, New York, NY, USA, 1996. Association for Computing Machinery. URL: https://doi.org/10.1145/237814.237823.
  2. A. Robert Calderbank, Anna C. Gilbert, Kirill Levchenko, Shan Muthukrishnan, and Martin Strauss. Improved range-summable random variable construction algorithms. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '05, pages 840-849, USA, 2005. Society for Industrial and Applied Mathematics. URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.6849.
  3. J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143-154, 1979. URL: https://doi.org/10.1016/0022-0000(79)90044-8.
  4. George Casella, Christian P. Robert, and Martin T. Wells. Generalized Accept-Reject Sampling Schemes, volume Volume 45 of Lecture Notes-Monograph Series, pages 342-347. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2004. URL: https://doi.org/10.1214/lnms/1196285403.
  5. Joan Feigenbaum, Sampath Kannan, Martin J. Strauss, and Mahesh Viswanathan. An approximate L₁-difference algorithm for massive data streams. SIAM Journal on Computing, 32(1):131-151, 2002. URL: https://doi.org/10.1137/S0097539799361701.
  6. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182-209, 1985. URL: https://doi.org/10.1016/0022-0000(85)90041-8.
  7. Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 389-398, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/509907.509966.
  8. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, May 2006. URL: https://doi.org/10.1145/1147954.1147955.
  9. George Marsaglia, Wai Wan Tsang, and Jingbo Wang. Fast generation of discrete random variables. Journal of Statistical Software, Articles, 11(3):1-11, 2004. URL: https://doi.org/10.18637/jss.v011.i03.
  10. Kuldeep S. Meel, N.V. Vinodchandran, and Sourav Chakraborty. Estimating the Size of Union of Sets in Streaming Models, pages 126-137. Association for Computing Machinery, New York, NY, USA, 2021. URL: https://doi.org/10.1145/3452021.3458333.
  11. S. Muthukrishnan and Martin Strauss. Approximate Histogram and Wavelet Summaries of Streaming Data, pages 263-281. Springer Berlin Heidelberg, Berlin, Heidelberg, 2016. URL: https://doi.org/10.1007/978-3-540-28608-0_13.
  12. Noam Nisan. Pseudorandom generators for space-bounded computation. Combinatorica, 12(4):449-461, 1992. URL: https://doi.org/10.1007/BF01305237.
  13. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes. McGraw-Hill, 1984. Google Scholar
  14. A. Pavan and Srikanta Tirthapura. Range-efficient counting of distinct elements in a massive data stream. SIAM Journal on Computing, 37(2):359-379, 2007. URL: https://doi.org/10.1137/050643672.
  15. Mihai Pundefinedtraşcu and Mikkel Thorup. The power of simple tabulation hashing. J. ACM, 59(3), June 2012. URL: https://doi.org/10.1145/2220357.2220361.
  16. Christian P. Robert and George Casella. Monte Carlo Statistical Methods, page 43. Springer New York, 2004. URL: https://doi.org/10.1007/978-1-4757-4145-2_2.
  17. Florin Rusu and Alin Dobra. Pseudo-random number generation for sketch-based estimations. ACM Trans. Database Syst., 32(2):11-es, June 2007. URL: https://doi.org/10.1145/1242524.1242528.
  18. Gokarna Sharma, Costas Busch, Ramachandran Vaidyanathan, Suresh Rai, and Jerry L. Trahan. Efficient transformations for Klee’s measure problem in the streaming model. Computational Geometry, 48(9):688-702, 2015. URL: https://doi.org/10.1016/j.comgeo.2015.06.007.
  19. James Stewart. Calculus: Early Transcendentals. Brooks/Cole, 4 edition, 1999. Google Scholar
  20. He Sun and Chung Keung Poon. Two improved range-efficient algorithms for F₀ estimation. Theoretical Computer Science, 410(11):1073-1080, 2009. Algorithms, Complexity and Models of Computation. URL: https://doi.org/10.1016/j.tcs.2008.10.031.
  21. Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '04, pages 615-624, USA, 2004. Society for Industrial and Applied Mathematics. Google Scholar
  22. Srikanta Tirthapura and David Woodruff. Rectangle-efficient aggregation in spatial data streams. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS '12, pages 283-294, New York, NY, USA, 2012. Association for Computing Machinery. URL: https://doi.org/10.1145/2213556.2213595.
  23. Reini Urban and et al. Smhasher: Hash function quality and speed tests. GitHub repository, https://github.com/rurban/smhasher. accessed on Jul 23, 2021.
  24. Huayi Wang, Jingfan Meng, Long Gong, Jun Xu, and Mitsunori Ogihara. MP-RW-LSH: An efficient multi-probe lsh solution to ANNS-L1. Proc. VLDB Endow., 14(13):3267-3280, September 2021. URL: https://doi.org/10.14778/3484224.3484226.
  25. Yi Wang. wyhash: The dream fast hash function and random number generators. GitHub repository, https://github.com/wangyi-fudan/wyhash. Accessed on Feb 9, 2021.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail