Independent Range Sampling, Revisited

Authors Peyman Afshani, Zhewei Wei



PDF
Thumbnail PDF

File

LIPIcs.ESA.2017.3.pdf
  • Filesize: 0.63 MB
  • 14 pages

Document Identifiers

Author Details

Peyman Afshani
Zhewei Wei

Cite As Get BibTex

Peyman Afshani and Zhewei Wei. Independent Range Sampling, Revisited. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 3:1-3:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ESA.2017.3

Abstract

In the independent range sampling (IRS) problem, given an input set P of n points in R^d, the task is to build a data structure, such that given a range R and an integer t >= 1, it returns t points that are uniformly and independently drawn from P cap R. The samples must satisfy  inter-query independence, that is, the samples returned by every query must be independent of the samples returned by  all the previous queries. This problem was first tackled by Hu, Qiao and Tao in 2014, who proposed optimal structures for one-dimensional dynamic  IRS problem in internal memory and one-dimensional static IRS problem  in external memory.

In this paper, we study two natural extensions of the independent range sampling problem. In the first extension, we consider the static IRS problem in two and three dimensions in internal memory. We obtain data structures with optimal space-query tradeoffs for 3D halfspace, 3D dominance, and 2D three-sided queries. The second extension considers weighted IRS problem. Each point is associated with a real-valued weight, and given a query range R, a sample is drawn independently such that each point in P cap R is selected with probability proportional to its weight. Walker's alias method is a classic solution to this problem when no query range is specified. We obtain optimal data structure for one dimensional weighted range sampling problem, thereby extending the alias method to allow range queries.

Subject Classification

Keywords
  • data structures
  • range searching
  • range sampling
  • random sampling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. URL: http://weizhewei.com/papers/esa17-full.pdf.
  2. Peyman Afshani and Timothy M. Chan. On approximate range counting and depth. Discrete and Computational Geometry, 42:3-21, 2009. URL: http://dx.doi.org/10.1007/s00454-009-9177-z.
  3. Peyman Afshani and Timothy M. Chan. Optimal halfspace range reporting in three dimensions. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 180-186, 2009. Google Scholar
  4. Peyman Afshani, Chris Hamilton, and Norbert Zeh. A general approach for cache-oblivious range reporting and approximate range counting. Computational Geometry: Theory and Applications, 43:700-712, 2010. preliminary version at SoCG'09. Google Scholar
  5. Pankaj K. Agarwal and Jeff Erickson. Geometric range searching and its relatives. Advances in Discrete and Computational Geometry, pages 1-56, 1999. Google Scholar
  6. Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 29-42. ACM, 2013. Google Scholar
  7. Arnab Bhadury, Jianfei Chen, Jun Zhu, and Shixia Liu. Scaling up dynamic topic models. In Proceedings of International World Wide Web Conferences (WWW), pages 381-390. International World Wide Web Conferences Steering Committee, 2016. Google Scholar
  8. Timothy M. Chan. Random sampling, halfspace range reporting, and construction of (<= k)-levels in three dimensions. SIAM Journal of Computing, 30(2):561-575, 2000. Google Scholar
  9. Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. Orthogonal Range Searching on the RAM, Revisited. arXiv preprint arXiv:1103.5510, 2011. Google Scholar
  10. Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. Random sampling for histogram construction: How much is enough? In ACM SIGMOD Record, pages 436-447. ACM, 1998. Google Scholar
  11. Bernard Chazelle, Leonidas J. Guibas, and D. T. Lee. The power of geometric duality. BIT Numerical Mathematics, 25(1):76-90, 1985. Google Scholar
  12. Robert Christensen, Lu Wang, Feifei Li, Ke Yi, Jun Tang, and Natalee Villa. Storm: Spatio-temporal online reasoning and management of large spatio-temporal data. In Proceedings of ACM Management of Data (SIGMOD), pages 1111-1116. ACM, 2015. Google Scholar
  13. Mark de Berg, Otfried Cheong, Marc van Kreveld, and Mark Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, 3 edition, 2008. Google Scholar
  14. Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. Online aggregation. ACM SIGMOD Record, 26(2):171-182, 1997. Google Scholar
  15. Xiaocheng Hu, Miao Qiao, and Yufei Tao. Independent range sampling. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 246-255. ACM, 2014. Google Scholar
  16. Aaron Q. Li, Amr Ahmed, Sujith Ravi, and Alexander J. Smola. Reducing the sampling complexity of topic models. In Proceedings of ACM Knowledge Discovery and Data Mining (SIGKDD), pages 891-900. ACM, 2014. Google Scholar
  17. Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data, pages 615-629. ACM, 2016. Google Scholar
  18. Jiří Matoušek. Efficient partition trees. Discrete & Computational Geometry, 8(3):315-334, 1992. Google Scholar
  19. Frank Olken. Random sampling from databases. PhD thesis, University of California at Berkeley, 1993. Google Scholar
  20. Frank Olken and Doron Rotem. Random sampling from databases: a survey. Statistics and Computing, 5(1):25-42, 1995. Google Scholar
  21. Frank Olken and Doron Rotem. Sampling from spatial databases. Statistics and Computing, 5(1):43-57, 1995. Google Scholar
  22. Mihai Patrascu and Mikkel Thorup. Time-space trade-offs for predecessor search. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 232-240, 2006. Google Scholar
  23. Edgar A. Ramos. On range reporting, ray shooting and k-level construction. In Symposium on Computational Geometry (SoCG), pages 390-399, 1999. Google Scholar
  24. Jiří Matoušek. Reporting points in halfspaces. Computational Geometry, Theory and Applications, 2(3):169-186, 1992. Google Scholar
  25. Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127-128, 1974. Google Scholar
  26. Lu Wang, Robert Christensen, Feifei Li, and Ke Yi. Spatial online sampling and aggregation. Proceedings of the VLDB Endowment, 9(3), 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail