Improved Streaming Algorithm for the Klee’s Measure Problem and Generalizations

Authors Mridul Nandi, N. V. Vinodchandran, Arijit Ghosh, Kuldeep S. Meel, Soumit Pal, Sourav Chakraborty



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2024.26.pdf
  • Filesize: 0.88 MB
  • 21 pages

Document Identifiers

Author Details

Mridul Nandi
  • Indian Statistical Institute, Kolkata, India
N. V. Vinodchandran
  • University of Nebraska, Lincoln, USA
Arijit Ghosh
  • Indian Statistical Institute, Kolkata, India
Kuldeep S. Meel
  • University of Toronto, Canada
Soumit Pal
  • Indian Statistical Institute, Kolkata, India
Sourav Chakraborty
  • Indian Statistical Institute, Kolkata, India

Acknowledgements

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) [RGPIN-2024-05956].

Cite AsGet BibTex

Mridul Nandi, N. V. Vinodchandran, Arijit Ghosh, Kuldeep S. Meel, Soumit Pal, and Sourav Chakraborty. Improved Streaming Algorithm for the Klee’s Measure Problem and Generalizations. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 26:1-26:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2024.26

Abstract

Estimating the size of the union of a stream of sets S₁, S₂, …, S_M where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the well-studied 𝖥₀ estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets S_i can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set S_i is an axis-parallel rectangle in d-dimensional spaces (Ω = [Δ]^d where [Δ] := {1, … ,Δ} and Δ ∈ ℕ). Recently, Meel, Chakraborty, and Vinodchandran (PODS-21, PODS-22) designed a streaming algorithm for (ε,δ)-estimation of the size of the union of set streams over Delphic family with space and update time complexity O((log³|Ω|)/ε² ⋅ log 1/δ) and Õ((log⁴|Ω|)/ε² ⋅ log 1/(δ)), respectively. This work presents a new, sampling-based algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Õ((log²|Ω|)/ε² ⋅ log 1/(δ)). This improves the space complexity bound by a log|Ω| factor and update time complexity bound by a log² |Ω| factor. A critical question is whether quadratic dependence of log|Ω| on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in log|Ω| and update time poly(log|Ω|)? While this appears technically challenging, we show that establishing a lower bound of ω(log|Ω|) with poly(log|Ω|) update time is beyond the reach of current techniques. Specifically, we show that under certain hard-to-prove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity O(log|Ω|) and update time poly(log(|Ω|)). Thus, establishing a space lower bound of ω(log|Ω|) will lead to break-through complexity class separation results.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
Keywords
  • Sampling
  • Streaming
  • Klee’s Measure Problem

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: https://doi.org/10.1006/jcss.1997.1545.
  2. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In Proc. of RANDOM, pages 1-10, 2002. URL: https://doi.org/10.1007/3-540-45726-7_1.
  3. Vladimir Batagelj and Ulrik Brandes. Efficient generation of large random networks. Phys. Rev. E, 71:036113, March 2005. URL: https://doi.org/10.1103/PhysRevE.71.036113.
  4. Jon Louis Bentley. Algorithms for klee’s rectangle problems. Technical report, Technical Report, Computer, 1977. Google Scholar
  5. Jaroslaw Blasiok. Optimal streaming and tracking distinct elements with high probability. In Proc. of SODA, 2018. URL: https://doi.org/10.1137/1.9781611975031.156.
  6. Karl Bringmann and Tobias Friedrich. Approximating the volume of unions and intersections of high-dimensional geometric objects. Comput. Geom., 43(6-7):601-610, 2010. URL: https://doi.org/10.1016/j.comgeo.2010.03.004.
  7. Larry Carter and Mark N. Wegman. Universal classes of hash functions. J. Comput. Syst. Sci., 18(2):143-154, 1979. URL: https://doi.org/10.1016/0022-0000(79)90044-8.
  8. Timothy M. Chan. A (slightly) faster algorithm for klee’s measure problem. Comput. Geom., 43(3):243-250, 2010. URL: https://doi.org/10.1016/j.comgeo.2009.01.007.
  9. Eric Y Chen and Timothy M Chan. Space-efficient algorithms for klee’s measure problem. algorithms, 3(5):6, 2005. Google Scholar
  10. Bogdan S. Chlebus. On the klee’s measure problem in small dimensions. In Branislav Rovan, editor, SOFSEM '98: Theory and Practice of Informatics, 25th Conference on Current Trends in Theory and Practice of Informatics, Jasná, volume 1521, pages 304-311, 1998. URL: https://doi.org/10.1007/3-540-49477-4_22.
  11. Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182-209, 1985. URL: https://doi.org/10.1016/0022-0000(85)90041-8.
  12. Michael L Fredman and Bruce Weide. On the complexity of computing the measure of ⋃[ai, bi]. Communications of the ACM, 21(7):540-544, 1978. Google Scholar
  13. E. Grädel, W. Thomas, and T. Wilke. Automata, Logics, and Infinite Games: A Guide to Current Research. Lecture Notes in Computer Science 2500. Springer, 2002. Google Scholar
  14. Joachim Gudmundsson and Rasmus Pagh. Range-efficient consistent sampling and locality-sensitive hashing for polygons. In 28th International Symposium on Algorithms and Computation, ISAAC, volume 92 of LIPIcs, pages 42:1-42:13, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.42.
  15. Piotr Indyk and David P. Woodruff. Tight lower bounds for the distinct elements problem. In 44th Symposium on Foundations of Computer Science (FOCS 2003), 11-14 October 2003, Cambridge, MA, USA, Proceedings, pages 283-288. IEEE Computer Society, 2003. URL: https://doi.org/10.1109/SFCS.2003.1238202.
  16. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 41-52, 2010. URL: https://doi.org/10.1145/1807085.1807094.
  17. Victor Klee. Can the measure of be computed in less than o (n log n) steps? The American Mathematical Monthly, 84(4):284-285, 1977. Google Scholar
  18. Kuldeep S. Meel, Sourav Chakraborty, and N. V. Vinodchandran. Estimation of the size of union of delphic sets: Achieving independence from stream size. In Leonid Libkin and Pablo Barceló, editors, PODS, pages 41-52. ACM, 2022. URL: https://doi.org/10.1145/3517804.3526222.
  19. Kuldeep S. Meel, N. V. Vinodchandran, and Sourav Chakraborty. Estimating the size of union of sets in streaming models. In Proc. of PODS, pages 126-137, 2021. URL: https://doi.org/10.1145/3452021.3458333.
  20. Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, USA, 2nd edition, 2017. Google Scholar
  21. Mark H Overmars and Chee-Keng Yap. New upper bounds in klee’s measure problem. SIAM Journal on Computing, 20(6):1034-1045, 1991. URL: https://doi.org/10.1137/0220065.
  22. A. Pavan and Srikanta Tirthapura. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput., 37(2):359-379, 2007. URL: https://doi.org/10.1137/050643672.
  23. Aduri Pavan, N. V. Vinodchandran, Arnab Bhattacharya, and Kuldeep S. Meel. Model counting meets f_0 estimation. In PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 299-311. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458311.
  24. He Sun and Chung Keung Poon. Two improved range-efficient algorithms for f_0 estimation. Theor. Comput. Sci., 410(11):1073-1080, 2009. URL: https://doi.org/10.1016/j.tcs.2008.10.031.
  25. Srikanta Tirthapura and David P. Woodruff. Rectangle-efficient aggregation in spatial data streams. In Proc. of PODS, pages 283-294. ACM, 2012. URL: https://doi.org/10.1145/2213556.2213595.
  26. Richard Ryan Williams. Time-space tradeoffs for counting np solutions modulo integers. computational complexity, 17:179-219, 2007. URL: https://api.semanticscholar.org/CorpusID:8815358.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail