Parallel Weighted Random Sampling

Hübschle-Schneider, Lorenz; Sanders, Peter

doi:10.4230/LIPIcs.ESA.2019.59

File

Subject Classification

ACM Subject Classification

Theory of computation → Sketching and sampling
Theory of computation → Parallel algorithms
Theory of computation → Data structures design and analysis

Keywords

categorical distribution
multinoulli distribution
parallel algorithm
alias method
PRAM
communication efficient algorithm
subset sampling
reservoir sampling

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries.

Cite As Get BibTex

Lorenz Hübschle-Schneider and Peter Sanders. Parallel Weighted Random Sampling. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 59:1-59:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ESA.2019.59

Author Details

Lorenz Hübschle-Schneider

Karlsruhe Institute of Technology, Germany

Peter Sanders

Karlsruhe Institute of Technology, Germany

References

Joachim H. Ahrens and Ulrich Dieter. Sequential Random Sampling. ACM Transactions on Mathematical Software (TOMS), 11(2):157-169, June 1985.
Richard Arratia. On the amount of dependence in the prime factorization of a uniform random integer. Contemporary combinatorics, 10:29-91, 2002. Page 36.
Timo Bingmann, Michael Axtmann, Emanual Jöbstl, Sebastian Lamm, Huyen Chau Nguyen, Alexander Noe, Sebastian Schlag, Matthias Stumpp, Tobias Sturm, and Peter Sanders. Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++. In 2016 IEEE International Conference on Big Data, pages 172-183. IEEE, 2016.
Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526-1538, November 1989.
Karl Bringmann and Kasper Green Larsen. Succinct Sampling from Discrete Distributions. In 45th ACM Symposium on Theory of Computing (STOC), pages 775-782. ACM, 2013.
Karl Bringmann and Konstantinos Panagiotou. Efficient sampling methods for discrete distributions. Algorithmica, 79(2):484-508, 2017.
M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, 69(3):653-656, 1982.
Richard Cole. Parallel Merge Sort. SIAM Journal on Computing, 17(4):770-785, 1988.
Luc Devroye. Non-Uniform Random Variate Generation. Springer, 1986.
Pavlos S Efraimidis. Weighted random sampling over data streams. In Algorithms, Probability, Networks, and Games: Scientific Papers and Essays Dedicated to Paul G. Spirakis on the Occasion of His 60th Birthday, pages 183-195. Springer, 2015.
Pavlos S Efraimidis and Paul G Spirakis. Fast parallel weighted random sampling. Technical Report TR99.04.02, CTI Patras, 1999.
Pavlos S Efraimidis and Paul G Spirakis. Weighted random sampling with a reservoir. Information Processing Letters, 97(5):181-185, 2006.
Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungmann, Patrick Alken, Michael Booth, Fabrice Rossi, and Rhys Ulerich. GNU scientific library: reference manual. Network Theory, 3 edition, 2009.
Torben Hagerup. Fast Parallel Generation of Random Permutations. In 18th International Colloquium on Automata, Languages and Programming (ICALP), pages 405-416. Springer, 1991.
Torben Hagerup, Kurt Mehlhorn, and J Ian Munro. Maintaining discrete probability distributions optimally. In 20th International Colloquium on Automata, Languages, and Programming (ICALP), pages 253-264. Springer, 1993.
Lorenz Hübschle-Schneider and Peter Sanders. Communication Efficient Algorithms for Top-k Selection Problems. In 30th International Parallel and Distributed Processing Symposium (IPDPS), pages 659-668. IEEE, 2016.
Intel. Intel Math Kernel Library 2019. Intel, 2019. URL: https://software.intel.com/en-us/mkl-reference-manual-for-c.
Joseph JáJá. An Introduction to Parallel Algorithms. Addison Wesley, 1992.
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994.
Kevin J Lang. Practical algorithms for generating a random ordering of the elements of a weighted set. Theory of Computing Systems, 54(4):659-688, 2014.
George Marsaglia, Wai Wan Tsang, Jingbo Wang, et al. Fast generation of discrete random variables. Journal of Statistical Software, 11(3):1-11, 2004.
Yossi Matias, Jeffrey Scott Vitter, and Wen-Chun Ni. Dynamic generation of discrete random variates. Theory of Computing Systems, 36(4):329-358, 2003.
M. Matsumoto and T. Nishimura. Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACMTMCS: ACM Transactions on Modeling and Computer Simulation, 8:3-30, 1998.
Jens Maue and Peter Sanders. Engineering Algorithms for Approximate Weighted Matching. In 6th Workshop on Experimental Algorithms (WEA), pages 242-255. Springer, 2007.
Kurt Mehlhorn and Peter Sanders. Algorithms and Data Structures - The Basic Toolbox. Springer, 2008.
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
Kirill Müller. Accelerating weighted random sampling without replacement. Arbeitsberichte Verkehrs-und Raumplanung, 1141, 2016.
Frank Olken and Doron Rotem. Random sampling from databases: a survey. Statistics and Computing, 5(1):25-42, 1995.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2019. URL: https://www.R-project.org.
Sanguthevar Rajasekaran and John H Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18(3):594-607, 1989.
Abhiram G Ranade. How to emulate shared memory. Journal of Computer and System Sciences, 42(3):307-326, 1991.
Peter Sanders. On the Competitive Analysis of Randomized Static Load Balancing. In S. Rajasekaran, editor, First Workshop on Randomized Parallel Algorithms, Honolulu, Hawaii, 1996. http://algo2.iti.kit.edu/sanders/papers/rand96.pdf.
Peter Sanders. Random Permutations on Distributed, External and Hierarchical Memory. Information Processing Letters, 67(6):305-310, 1998.
Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online. ACM Transaction on Mathematical Sofware, 44(3):29:1-29:14, 2018.
Peter Sanders, Sebastian Schlag, and Ingo Müller. Communication Efficient Algorithms for Fundamental Big Data Problems. In 2013 IEEE International Conference on Big Data, pages 15-23. IEEE, 2013.
Julian Shun. Improved parallel construction of wavelet trees and rank/select structures. In 2017 Data Compression Conference (DCC), pages 92-101. IEEE, 2017.
Michael D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on Software Engineering (TSE), 17(9):972-975, 1991.
Alastair J Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3(3):253-256, 1977.
Wikichip.org. Intel Xeon Gold 6138. https://en.wikichip.org/w/index.php?title=intel/xeon_gold/6138&oldid=71062, 2019. Accessed April 26, 2019.
Chak-Kuen Wong and Malcolm C. Easton. An efficient method for weighted sampling without replacement. SIAM Journal on Computing, 9(1):111-113, 1980.
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In 24th ACM Symposium on Operating Systems Principles (SOSP), pages 423-438. ACM, 2013.
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. Apache Spark: a unified engine for big data processing. Communications of the ACM, 59(11):56-65, 2016.

Parallel Weighted Random Sampling

Authors Lorenz Hübschle-Schneider, Peter Sanders

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Parallel Weighted Random Sampling

Authors Lorenz Hübschle-Schneider, Peter Sanders

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Supplementary Materials

References

Thanks for your feedback!

Could not send message