Independent Range Sampling, Revisited Again

Afshani, Peyman; Phillips, Jeff M.

doi:10.4230/LIPIcs.SoCG.2019.4

File

LIPIcs.SoCG.2019.4.pdf

Filesize: 467 kB
13 pages

Document Identifiers

DOI: 10.4230/LIPIcs.SoCG.2019.4
URN: urn:nbn:de:0030-drops-104088

Author Details

Peyman Afshani

Aarhus University, Denmark

Jeff M. Phillips

University of Utah, Salt Lake City, USA

Cite AsGet BibTex

Peyman Afshani and Jeff M. Phillips. Independent Range Sampling, Revisited Again. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.SoCG.2019.4

Abstract

We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer k, we can extract k independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight. This line of work was initiated in 2014 by Hu, Qiao, and Tao and it was later followed up by Afshani and Wei. The first line of work mostly studied unweighted but dynamic version of the problem in one dimension whereas the second result considered the static weighted problem in one dimension as well as the unweighted problem in 3D for halfspace queries. We offer three main results and some interesting insights that were missed by the previous work: We show that it is possible to build efficient data structures for range sampling queries if we allow the query time to hold in expectation (the first result), or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight (the second result). The third result is a conditional lower bound that shows essentially one of the previous two concessions is needed. For instance, for the 3D range sampling queries, the first two results give efficient data structures with near-linear space and polylogarithmic query time whereas the lower bound shows with near-linear space the worst-case query time must be close to n^{2/3}, ignoring polylogarithmic factors. Up to our knowledge, this is the first such major gap between the expected and worst-case query time of a range searching problem.

Subject Classification

ACM Subject Classification

Theory of computation → Randomness, geometry and discrete structures
Theory of computation → Computational geometry

Keywords

Range Searching
Data Structures
Sampling

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Peyman Afshani and Jeff M. Phillips. Independent Range Sampling, Revisited Again, 2019. URL: http://arxiv.org/abs/1903.08014.
Peyman Afshani and Zhewei Wei. Independent Range Sampling, Revisited. In European Symposium on Algorithms, pages 3:1-3:14, 2017.
Pankaj K. Agarwal and Jeff Erickson. Geometric range searching and its relatives. Advances in Discrete and Computational Geometry, pages 1-56, 1999.
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 29-42. ACM, 2013.
Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. Random sampling for histogram construction: How much is enough? In ACM SIGMOD Record, pages 436-447. ACM, 1998.
T Hagerup, K Mehlhorn, and JI Munro. Optimal algorithms for generating discrete random variables with changing distributions. Lecture Notes in Computer Science, 700:253-264, 1993.
Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. Online aggregation. ACM SIGMOD Record, 26(2):171-182, 1997.
Xiaocheng Hu, Miao Qiao, and Yufei Tao. Independent range sampling. In ACM Symposium on Principles of Database Systems, pages 246-255, 2014.
Jiří Matoušek. Efficient partition trees. Discrete &Computational Geometry, 8(3):315-334, 1992.
Frank Olken. Random sampling from databases. PhD thesis, University of California at Berkeley, 1993.
Frank Olken and Doron Rotem. Random sampling from databases: a survey. Statistics and Computing, 5(1):25-42, 1995.
Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127-128, 1974.

Independent Range Sampling, Revisited Again

Authors Peyman Afshani, Jeff M. Phillips

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Independent Range Sampling, Revisited Again

Authors Peyman Afshani, Jeff M. Phillips

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message