Parallel Weighted Random Sampling

Authors Lorenz Hübschle-Schneider, Peter Sanders



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.59.pdf
  • Filesize: 0.64 MB
  • 24 pages

Document Identifiers

Author Details

Lorenz Hübschle-Schneider
  • Karlsruhe Institute of Technology, Germany
Peter Sanders
  • Karlsruhe Institute of Technology, Germany

Cite As Get BibTex

Lorenz Hübschle-Schneider and Peter Sanders. Parallel Weighted Random Sampling. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 59:1-59:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ESA.2019.59

Abstract

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
  • Theory of computation → Parallel algorithms
  • Theory of computation → Data structures design and analysis
Keywords
  • categorical distribution
  • multinoulli distribution
  • parallel algorithm
  • alias method
  • PRAM
  • communication efficient algorithm
  • subset sampling
  • reservoir sampling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Joachim H. Ahrens and Ulrich Dieter. Sequential Random Sampling. ACM Transactions on Mathematical Software (TOMS), 11(2):157-169, June 1985. Google Scholar
  2. Richard Arratia. On the amount of dependence in the prime factorization of a uniform random integer. Contemporary combinatorics, 10:29-91, 2002. Page 36. Google Scholar
  3. Timo Bingmann, Michael Axtmann, Emanual Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Sebastian Schlag, Matthias Stumpp, Tobias Sturm, and Peter Sanders. Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++. In 2016 IEEE International Conference on Big Data, pages 172-183. IEEE, 2016. Google Scholar
  4. Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526-1538, November 1989. Google Scholar
  5. Karl Bringmann and Kasper Green Larsen. Succinct Sampling from Discrete Distributions. In 45th ACM Symposium on Theory of Computing (STOC), pages 775-782. ACM, 2013. Google Scholar
  6. Karl Bringmann and Konstantinos Panagiotou. Efficient sampling methods for discrete distributions. Algorithmica, 79(2):484-508, 2017. Google Scholar
  7. M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, 69(3):653-656, 1982. Google Scholar
  8. Richard Cole. Parallel Merge Sort. SIAM Journal on Computing, 17(4):770-785, 1988. Google Scholar
  9. Luc Devroye. Non-Uniform Random Variate Generation. Springer, 1986. Google Scholar
  10. Pavlos S Efraimidis. Weighted random sampling over data streams. In Algorithms, Probability, Networks, and Games: Scientific Papers and Essays Dedicated to Paul G. Spirakis on the Occasion of His 60th Birthday, pages 183-195. Springer, 2015. Google Scholar
  11. Pavlos S Efraimidis and Paul G Spirakis. Fast parallel weighted random sampling. Technical Report TR99.04.02, CTI Patras, 1999. Google Scholar
  12. Pavlos S Efraimidis and Paul G Spirakis. Weighted random sampling with a reservoir. Information Processing Letters, 97(5):181-185, 2006. Google Scholar
  13. Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungmann, Patrick Alken, Michael Booth, Fabrice Rossi, and Rhys Ulerich. GNU scientific library: reference manual. Network Theory, 3 edition, 2009. Google Scholar
  14. Torben Hagerup. Fast Parallel Generation of Random Permutations. In 18th International Colloquium on Automata, Languages and Programming (ICALP), pages 405-416. Springer, 1991. Google Scholar
  15. Torben Hagerup, Kurt Mehlhorn, and J Ian Munro. Maintaining discrete probability distributions optimally. In 20th International Colloquium on Automata, Languages, and Programming (ICALP), pages 253-264. Springer, 1993. Google Scholar
  16. Lorenz Hübschle-Schneider and Peter Sanders. Communication Efficient Algorithms for Top-k Selection Problems. In 30th International Parallel and Distributed Processing Symposium (IPDPS), pages 659-668. IEEE, 2016. Google Scholar
  17. Intel. Intel Math Kernel Library 2019. Intel, 2019. URL: https://software.intel.com/en-us/mkl-reference-manual-for-c.
  18. Joseph JáJá. An Introduction to Parallel Algorithms. Addison Wesley, 1992. Google Scholar
  19. Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994. Google Scholar
  20. Kevin J Lang. Practical algorithms for generating a random ordering of the elements of a weighted set. Theory of Computing Systems, 54(4):659-688, 2014. Google Scholar
  21. George Marsaglia, Wai Wan Tsang, Jingbo Wang, et al. Fast generation of discrete random variables. Journal of Statistical Software, 11(3):1-11, 2004. Google Scholar
  22. Yossi Matias, Jeffrey Scott Vitter, and Wen-Chun Ni. Dynamic generation of discrete random variates. Theory of Computing Systems, 36(4):329-358, 2003. Google Scholar
  23. M. Matsumoto and T. Nishimura. Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACMTMCS: ACM Transactions on Modeling and Computer Simulation, 8:3-30, 1998. Google Scholar
  24. Jens Maue and Peter Sanders. Engineering Algorithms for Approximate Weighted Matching. In 6th Workshop on Experimental Algorithms (WEA), pages 242-255. Springer, 2007. Google Scholar
  25. Kurt Mehlhorn and Peter Sanders. Algorithms and Data Structures - The Basic Toolbox. Springer, 2008. Google Scholar
  26. Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google Scholar
  27. Kirill Müller. Accelerating weighted random sampling without replacement. Arbeitsberichte Verkehrs-und Raumplanung, 1141, 2016. Google Scholar
  28. Frank Olken and Doron Rotem. Random sampling from databases: a survey. Statistics and Computing, 5(1):25-42, 1995. Google Scholar
  29. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2019. URL: https://www.R-project.org.
  30. Sanguthevar Rajasekaran and John H Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18(3):594-607, 1989. Google Scholar
  31. Abhiram G Ranade. How to emulate shared memory. Journal of Computer and System Sciences, 42(3):307-326, 1991. Google Scholar
  32. Peter Sanders. On the Competitive Analysis of Randomized Static Load Balancing. In S. Rajasekaran, editor, First Workshop on Randomized Parallel Algorithms, Honolulu, Hawaii, 1996. http://algo2.iti.kit.edu/sanders/papers/rand96.pdf.
  33. Peter Sanders. Random Permutations on Distributed, External and Hierarchical Memory. Information Processing Letters, 67(6):305-310, 1998. Google Scholar
  34. Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online. ACM Transaction on Mathematical Sofware, 44(3):29:1-29:14, 2018. Google Scholar
  35. Peter Sanders, Sebastian Schlag, and Ingo Müller. Communication Efficient Algorithms for Fundamental Big Data Problems. In 2013 IEEE International Conference on Big Data, pages 15-23. IEEE, 2013. Google Scholar
  36. Julian Shun. Improved parallel construction of wavelet trees and rank/select structures. In 2017 Data Compression Conference (DCC), pages 92-101. IEEE, 2017. Google Scholar
  37. Michael D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on Software Engineering (TSE), 17(9):972-975, 1991. Google Scholar
  38. Alastair J Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3(3):253-256, 1977. Google Scholar
  39. Wikichip.org. Intel Xeon Gold 6138. https://en.wikichip.org/w/index.php?title=intel/xeon_gold/6138&oldid=71062, 2019. Accessed April 26, 2019.
  40. Chak-Kuen Wong and Malcolm C. Easton. An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9(1):111-113, 1980. Google Scholar
  41. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In 24th ACM Symposium on Operating Systems Principles (SOSP), pages 423-438. ACM, 2013. Google Scholar
  42. Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. Apache Spark: a unified engine for big data processing. Communications of the ACM, 59(11):56-65, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail