On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions

Authors Jingfan Meng, Huayi Wang, Jun Xu, Mitsunori Ogihara



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.21.pdf
  • Filesize: 0.99 MB
  • 18 pages

Document Identifiers

Author Details

Jingfan Meng
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Huayi Wang
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Jun Xu
  • School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Mitsunori Ogihara
  • Department of Computer Science, University of Miami, Coral Gables, MI, USA

Cite AsGet BibTex

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICDT.2023.21

Abstract

d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • fast range-summation
  • multidimensional data streams
  • Haar wavelet transform

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC '96, pages 20-29, New York, NY, USA, 1996. Association for Computing Machinery. URL: https://doi.org/10.1145/237814.237823.
  2. A. R. Calderbank, A. Gilbert, K. Levchenko, S. Muthukrishnan, and M. Strauss. Improved range-summable random variable construction algorithms. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '05, pages 840-849, USA, 2005. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1070432.1070550.
  3. J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143-154, 1979. URL: https://doi.org/10.1016/0022-0000(79)90044-8.
  4. Graham Cormode, Minos Garofalakis, and Dimitris Sacharidis. Fast approximate wavelet tracking on streams. In Yannis Ioannidis, Marc H. Scholl, Joachim W. Schmidt, Florian Matthes, Mike Hatzopoulos, Klemens Boehm, Alfons Kemper, Torsten Grust, and Christian Boehm, editors, Advances in Database Technology - EDBT 2006, pages 4-22, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/11687238_4.
  5. Joan Feigenbaum, Sampath Kannan, Martin J. Strauss, and Mahesh Viswanathan. An approximate L1-difference algorithm for massive data streams. SIAM Journal on Computing, 32(1):131-151, 2002. URL: https://doi.org/10.1137/S0097539799361701.
  6. Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 389-398, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/509907.509966.
  7. Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin J. Strauss. One-pass wavelet decompositions of data streams. IEEE Trans. on Knowl. and Data Eng., 15(3):541-554, March 2003. URL: https://doi.org/10.1109/TKDE.2003.1198389.
  8. J. Gray, A. Bosworth, A. Lyaman, and H. Pirahesh. Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS. In Proceedings of the Twelfth International Conference on Data Engineering, pages 152-159, 1996. URL: https://doi.org/10.1109/ICDE.1996.492099.
  9. Nabil Ibtehaz, M. Kaykobad, and M. Sohel Rahman. Multidimensional segment trees can do range updates in poly-logarithmic time. Theoretical Computer Science, 854:30-43, 2021. URL: https://doi.org/10.1016/j.tcs.2020.11.034.
  10. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, May 2006. URL: https://doi.org/10.1145/1147954.1147955.
  11. Mehrdad Jahangiri, Dimitris Sacharidis, and Cyrus Shahabi. SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD '05, pages 275-286, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1066157.1066189.
  12. Panagiotis Karras and Nikos Mamoulis. The Haar+ tree: A refined synopsis data structure. In 2007 IEEE 23rd International Conference on Data Engineering, pages 436-445, 2007. URL: https://doi.org/10.1109/ICDE.2007.367889.
  13. Alan J. Laub. Matrix analysis - for scientists and engineers. SIAM, 2005. URL: http://bookstore.siam.org/ot91/.
  14. Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. On efficient range-summability of IID random variables in two or higher dimensions (extended version). CoRR, abs/2110.07753, 2021. URL: http://arxiv.org/abs/2110.07753.
  15. Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. A Dyadic Simulation Approach to Efficient Range-Summability. In Dan Olteanu and Nils Vortmeier, editors, 25th International Conference on Database Theory (ICDT 2022), volume 220 of Leibniz International Proceedings in Informatics (LIPIcs), pages 17:1-17:18, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ICDT.2022.17.
  16. S. Muthukrishnan and Martin Strauss. Maintenance of multidimensional histograms. In Paritosh K. Pandya and Jaikumar Radhakrishnan, editors, FST TCS 2003: Foundations of Software Technology and Theoretical Computer Science, pages 352-362, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/978-3-540-24597-1_30.
  17. Yves Nievergelt. Multidimensional Wavelets and Applications, pages 36-72. Birkhäuser Boston, Boston, MA, 1999. URL: https://doi.org/10.1007/978-1-4612-0573-9_2.
  18. Noam Nisan. Pseudorandom generators for space-bounded computation. Comb., 12(4):449-461, 1992. URL: https://doi.org/10.1007/BF01305237.
  19. Mihai Pundefinedtraşcu and Mikkel Thorup. The power of simple tabulation hashing. J. ACM, 59(3), June 2012. URL: https://doi.org/10.1145/2220357.2220361.
  20. Christian P. Robert and George Casella. Monte Carlo Statistical Methods, page 43. Springer New York, 2004. URL: https://doi.org/10.1007/978-1-4757-4145-2_2.
  21. Florin Rusu and Alin Dobra. Pseudo-random number generation for sketch-based estimations. ACM Trans. Database Syst., 32(2):11-es, June 2007. URL: https://doi.org/10.1145/1242524.1242528.
  22. Rolfe R. Schmidt and Cyrus Shahabi. Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. In Christian S. Jensen, Simonas Šaltenis, Keith G. Jeffery, Jaroslav Pokorny, Elisa Bertino, Klemens Böhn, and Matthias Jarke, editors, Advances in Database Technology - EDBT 2002, pages 664-681, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. URL: https://doi.org/10.1007/3-540-45876-X_41.
  23. Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '04, pages 615-624, USA, 2004. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=982792.982884.
  24. Mikkel Thorup and Yin Zhang. Tabulation based 5-universal hashing and linear probing. In Proceedings of the Meeting on Algorithm Engineering and Expermiments, ALENEX '10, pages 62-76, USA, 2010. Society for Industrial and Applied Mathematics. URL: https://doi.org/10.1137/1.9781611972900.7.
  25. Roman Vershynin. Random Vectors in High Dimensions, pages 38-69. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. URL: https://doi.org/10.1017/9781108231596.006.