Slide 1

Slide 1 text

Fast random number generation Bernardt Duvenhage Feersum Engine, Praekelt Consulting

Slide 2

Slide 2 text

Outline How to efficiently generate a random number. How to do it in Python and Numpy.

Slide 3

Slide 3 text

Generating a random number.

Slide 4

Slide 4 text

Generating a random number. A fair/unbiased dice can generate a uniform random number.

Slide 5

Slide 5 text

Generating a random number. A fair/unbiased dice can generate a uniform random number. Numbers 1-3, 1-4, 1-5, 1-6, 1-10, 1-12, 1-20, …

Slide 6

Slide 6 text

Generating a random number. A fair/unbiased dice can generate a uniform random number. Numbers 1-3, 1-4, 1-5, 1-6, 1-10, 1-12, 1-20, … Given these dice, how does one generate a uniform random number in [1,15]?

Slide 7

Slide 7 text

Generating a random number.

Slide 8

Slide 8 text

Generating a random number. One can use a computer program to generate pseudo random numbers.

Slide 9

Slide 9 text

Generating a random number. One can use a computer program to generate pseudo random numbers. LCG, Mersenne Twister, XorShift, PCG.

Slide 10

Slide 10 text

Generating a random number. One can use a computer program to generate pseudo random numbers. LCG, Mersenne Twister, XorShift, PCG. Numbers [0, 216), [0, 232), [0, 264), …

Slide 11

Slide 11 text

Generating a random number. One can use a computer program to generate pseudo random numbers. LCG, Mersenne Twister, XorShift, PCG. Numbers [0, 216), [0, 232), [0, 264), … Given a random number in [0, 232), how does one now generate a uniform random number in [1,15]?

Slide 12

Slide 12 text

Linear Congruential Generator

Slide 13

Slide 13 text

Linear Congruential Generator Xn+1 = (aXn + c) mod m.

Slide 14

Slide 14 text

Linear Congruential Generator Xn+1 = (aXn + c) mod m. If m is 232 Xn+1 = 1664525 * Xn + 1013904223 # numerical recipes Xn+1 = 1103515245 * Xn + 12345 # glibc Xn+1 = 214013 * Xn + 2531011 # msvs

Slide 15

Slide 15 text

Linear Congruential Generator Xn+1 = (aXn + c) mod m. If m is 232 Xn+1 = 1664525 * Xn + 1013904223 # numerical recipes Xn+1 = 1103515245 * Xn + 12345 # glibc Xn+1 = 214013 * Xn + 2531011 # msvs LCG128 >> 64 passes TestU01’s BigCrush! *

Slide 16

Slide 16 text

Linear Congruential Generator Xn+1 = (aXn + c) mod m. If m is 232 Xn+1 = 1664525 * Xn + 1013904223 # numerical recipes Xn+1 = 1103515245 * Xn + 12345 # glibc Xn+1 = 214013 * Xn + 2531011 # msvs LCG128 >> 64 passes TestU01’s BigCrush! * * [http://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html]

Slide 17

Slide 17 text

Multiplicative Congruential Generator

Slide 18

Slide 18 text

Multiplicative Congruential Generator Xn+1 = (aXn + 0) mod m

Slide 19

Slide 19 text

Multiplicative Congruential Generator Xn+1 = (aXn + 0) mod m MCG128 >> 64 passes TestU01’s BigCrush! * m = 2^128 a = 92563704562804186071655587898373606109 **

Slide 20

Slide 20 text

Multiplicative Congruential Generator Xn+1 = (aXn + 0) mod m MCG128 >> 64 passes TestU01’s BigCrush! * m = 2^128 a = 92563704562804186071655587898373606109 ** ** [Pierre L’ECUYER. 1999. Tables of Linear Congruential Generators of Different …]

Slide 21

Slide 21 text

Multiplicative Congruential Generator Xn+1 = (aXn + 0) mod m MCG128 >> 64 passes TestU01’s BigCrush! * m = 2^128 a = 92563704562804186071655587898373606109 ** ** [Pierre L’ECUYER. 1999. Tables of Linear Congruential Generators of Different …] * [http://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html]

Slide 22

Slide 22 text

Mersenne Twister, XorShift & PCG.

Slide 23

Slide 23 text

Mersenne Twister, XorShift & PCG. Mersenne Twister - 1997 Very popular, but relatively large mem footprint.

Slide 24

Slide 24 text

Mersenne Twister, XorShift & PCG. Mersenne Twister - 1997 Very popular, but relatively large mem footprint. XorShift - 2003 Much smaller mem footprint.

Slide 25

Slide 25 text

Mersenne Twister, XorShift & PCG. Mersenne Twister - 1997 Very popular, but relatively large mem footprint. XorShift - 2003 Much smaller mem footprint. Permuted Congruential Generator (PCG) - 2014 Small mem footprint & faster than MT and XorShift. Melissa O'Neill @ Stanford - https://www.youtube.com/watch?v=45Oet5qjlms

Slide 26

Slide 26 text

Where are PRNGs used?

Slide 27

Slide 27 text

Where are PRNGs used? When you don’t actually have a dice!

Slide 28

Slide 28 text

Where are PRNGs used? When you don’t actually have a dice! Any computational simulation of some physical system.

Slide 29

Slide 29 text

Where are PRNGs used? When you don’t actually have a dice! Any computational simulation of some physical system. Searching of high dimensional spaces.

Slide 30

Slide 30 text

Generating a random number. One can use a computer program to generate pseudo random numbers. LCG, Mersenne Twister, XorShift, PCG. Numbers [0, 216), [0, 232), [0, 264), … Given a random number in 232, how does one generate a uniform random number in any interval?

Slide 31

Slide 31 text

Random number rn in [1, s).

Slide 32

Slide 32 text

Random number rn in [1, s). Given Xn in [0, 2L) generate a uniform random number rn in [1, s).

Slide 33

Slide 33 text

Random number rn in [1, s). Given Xn in [0, 2L) generate a uniform random number rn in [1, s). We know Xn in [0, 2L) takes 3 - 10 clock cycles.

Slide 34

Slide 34 text

Random number rn in [1, s). Given Xn in [0, 2L) generate a uniform random number rn in [1, s). We know Xn in [0, 2L) takes 3 - 10 clock cycles. rn in [1, s) should not take much longer.

Slide 35

Slide 35 text

rn = Xn mod s

Slide 36

Slide 36 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,…

Slide 37

Slide 37 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,… If Xn in [0, 23) and s=3

Slide 38

Slide 38 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,… If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5,6,7 => rn :0,1,2,0,1,2,0,1

Slide 39

Slide 39 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,… If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5,6,7 => rn :0,1,2,0,1,2,0,1 0,1 appears more often than 2 so rn is not uniform!

Slide 40

Slide 40 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,… If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5,6,7 => rn :0,1,2,0,1,2,0,1 0,1 appears more often than 2 so rn is not uniform! How efficient is this approach?

Slide 41

Slide 41 text

rn = Xn mod s Xn mod 3, for example, gives 1,0,0,2,0,1,2,1,0,1,2,… If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5,6,7 => rn :0,1,2,0,1,2,0,1 0,1 appears more often than 2 so rn is not uniform! How efficient is this approach? The mod/div can take 20 - 50+ cycles.

Slide 42

Slide 42 text

rn = Xn mod s with rejection

Slide 43

Slide 43 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3

Slide 44

Slide 44 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5|,6,7 => rn :0,1,2,0,1,2|,0,1

Slide 45

Slide 45 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5|,6,7 => rn :0,1,2,0,1,2|,0,1 While (Xn >= 2L - (2L mod s)) reject the sample.

Slide 46

Slide 46 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5|,6,7 => rn :0,1,2,0,1,2|,0,1 While (Xn >= 2L - (2L mod s)) reject the sample. Finally, rn = Xn mod s

Slide 47

Slide 47 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5|,6,7 => rn :0,1,2,0,1,2|,0,1 While (Xn >= 2L - (2L mod s)) reject the sample. Finally, rn = Xn mod s How efficient is this approach?

Slide 48

Slide 48 text

rn = Xn mod s with rejection If Xn in [0, 23) and s=3 Xn :0,1,2,3,4,5|,6,7 => rn :0,1,2,0,1,2|,0,1 While (Xn >= 2L - (2L mod s)) reject the sample. Finally, rn = Xn mod s How efficient is this approach? Two mods and 2/8 of the samples are rejected in this case.

Slide 49

Slide 49 text

Masking + rejection sampling

Slide 50

Slide 50 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s

Slide 51

Slide 51 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s Xn * = Xn & 2M-1

Slide 52

Slide 52 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s Xn * = Xn & 2M-1 While (Xn * >= s) reject the sample.

Slide 53

Slide 53 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s Xn * = Xn & 2M-1 While (Xn * >= s) reject the sample. How efficient is rejection sampling + masking?

Slide 54

Slide 54 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s Xn * = Xn & 2M-1 While (Xn * >= s) reject the sample. How efficient is rejection sampling + masking? Worst case s=2(M-1)+1 e.g. 1025 if M=11 => approx. 50% rejected.

Slide 55

Slide 55 text

Masking + rejection sampling Generate Xn * in [0, 2M) for M such that 2M >= s Xn * = Xn & 2M-1 While (Xn * >= s) reject the sample. How efficient is rejection sampling + masking? Worst case s=2(M-1)+1 e.g. 1025 if M=11 => approx. 50% rejected. No mods.

Slide 56

Slide 56 text

Integer scaling

Slide 57

Slide 57 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s

Slide 58

Slide 58 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s Finally, rn = Qn >> 32

Slide 59

Slide 59 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s Finally, rn = Qn >> 32 How efficient is this approach?

Slide 60

Slide 60 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s Finally, rn = Qn >> 32 How efficient is this approach? 64-bit is very efficient on modern CPUs.

Slide 61

Slide 61 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s Finally, rn = Qn >> 32 How efficient is this approach? 64-bit is very efficient on modern CPUs. Does this generate a uniform random number rn in [1, s)?

Slide 62

Slide 62 text

Integer scaling Generate Xn in [0, 232) and calc Qn = Xn x s Finally, rn = Qn >> 32 How efficient is this approach? 64-bit is very efficient on modern CPUs. Does this generate a uniform random number rn in [1, s)? No, some numbers in [1, s) will appear more often than others.

Slide 63

Slide 63 text

Integer scaling (cont.) E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Xn ∈ 0,1,2,3,4,5,6,7

Slide 64

Slide 64 text

Integer scaling (cont.) E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Xn ∈ 0,1,2,3,4,5,6,7 Qn = Xn x 3 in [0, 24) 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

Slide 65

Slide 65 text

Integer scaling (cont.) E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Xn ∈ 0,1,2,3,4,5,6,7 Qn = Xn x 3 in [0, 24) 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

Slide 66

Slide 66 text

Integer scaling (cont.) E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Xn ∈ 0,1,2,3,4,5,6,7 Qn = Xn x 3 in [0, 24) 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

Slide 67

Slide 67 text

Integer scaling (cont.) E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Xn ∈ 0,1,2,3,4,5,6,7 Qn = Xn x 3 in [0, 24) 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 Qn ÷ 23 in [0, 3) {0,1,2,3,4,5,6,7} => 0, {8,9,10,11,12,13,14,15} => 1, {16,17,18,19,20,21,22,23} => 2 appears less often than 0 and 1!

Slide 68

Slide 68 text

Integer scaling + rejection sampling* * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 69

Slide 69 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 70

Slide 70 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Qn ÷ 23 in [0, 3) {0,1,|2,3,4,5,6,7} => 0, {8,9,|10,11,12,13,14,15} => 1, {16,17,|18,19,20,21,22,23} => 2 * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 71

Slide 71 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Qn ÷ 23 in [0, 3) {0,1,|2,3,4,5,6,7} => 0, {8,9,|10,11,12,13,14,15} => 1, {16,17,|18,19,20,21,22,23} => 2 Reject if Qn in first (2L mod s) positions of an s bin * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 72

Slide 72 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Qn ÷ 23 in [0, 3) {0,1,|2,3,4,5,6,7} => 0, {8,9,|10,11,12,13,14,15} => 1, {16,17,|18,19,20,21,22,23} => 2 Reject if Qn in first (2L mod s) positions of an s bin 2 will now appear as often as 0 and 1! * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 73

Slide 73 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Qn ÷ 23 in [0, 3) {0,1,|2,3,4,5,6,7} => 0, {8,9,|10,11,12,13,14,15} => 1, {16,17,|18,19,20,21,22,23} => 2 Reject if Qn in first (2L mod s) positions of an s bin 2 will now appear as often as 0 and 1! How efficient is this approach? * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 74

Slide 74 text

Integer scaling + rejection sampling* E.g. Xn in [0, 2L) and calc Qn = Xn x s for s = 3 and L = 3 Qn ÷ 23 in [0, 3) {0,1,|2,3,4,5,6,7} => 0, {8,9,|10,11,12,13,14,15} => 1, {16,17,|18,19,20,21,22,23} => 2 Reject if Qn in first (2L mod s) positions of an s bin 2 will now appear as often as 0 and 1! How efficient is this approach? One mod and chance of rejection is (2L mod s)/2L < s/2L * Daniel Lemire. 2018. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation (to appear)

Slide 75

Slide 75 text

Integer scaling + rejection sampling

Slide 76

Slide 76 text

Integer scaling + rejection sampling

Slide 77

Slide 77 text

Integer scaling + rejection sampling Mod/divide by 2L is just and/shift and therefore efficient.

Slide 78

Slide 78 text

Integer scaling + rejection sampling Mod/divide by 2L is just and/shift and therefore efficient. s is an upper bound of t and used so that we only do a mod with chance s/(2L-1).

Slide 79

Slide 79 text

Integer scaling + rejection sampling Mod/divide by 2L is just and/shift and therefore efficient. s is an upper bound of t and used so that we only do a mod with chance s/(2L-1). t is only dependent on s and not on x as well so no need to recalculate t whenever a number is rejected!

Slide 80

Slide 80 text

Integer scaling + rejection sampling

Slide 81

Slide 81 text

Python and Numpy

Slide 82

Slide 82 text

Python and Numpy Python uses mod+rejection OR masking+rejection.

Slide 83

Slide 83 text

Python and Numpy Python uses mod+rejection OR masking+rejection. Numpy uses masking+rejection.

Slide 84

Slide 84 text

Python and Numpy Python uses mod+rejection OR masking+rejection. Numpy uses masking+rejection. I did a Numpy PR in mid August, but Charles Harris pointed me to NEP-19 and the RandomGen project.

Slide 85

Slide 85 text

Python and Numpy Python uses mod+rejection OR masking+rejection. Numpy uses masking+rejection. I did a Numpy PR in mid August, but Charles Harris pointed me to NEP-19 and the RandomGen project. Kevin Sheppard from RandomGen seemed keen.

Slide 86

Slide 86 text

Python and Numpy Python uses mod+rejection OR masking+rejection. Numpy uses masking+rejection. I did a Numpy PR in mid August, but Charles Harris pointed me to NEP-19 and the RandomGen project. Kevin Sheppard from RandomGen seemed keen. So I did a RandomGen PR to optionally use Lemire’s algorithm for unsigned integers in an interval (busy testing 64 bit).

Slide 87

Slide 87 text

NEP-19 https://www.numpy.org/neps/nep-0019-rng-policy.html

Slide 88

Slide 88 text

RandomGen https://github.com/bashtage/randomgen - "Provides access to more modern PRNGs that support modern features such as streams and easy advancement so that they can be easily used on 1000s of nodes." May also be used as alternative to the Python random module. PRNGs and rejection sampling implemented in C. Provides a number of random number generators MT, XorShift, PCG, ThreeFry, Philox.

Slide 89

Slide 89 text

Results - Masked rejection vs. Lemire

Slide 90

Slide 90 text

Results - Masked rejection vs. Lemire Best case performance is slightly worse.

Slide 91

Slide 91 text

Results - Masked rejection vs. Lemire Best case performance is slightly worse. Reduces average execution time by 54%.

Slide 92

Slide 92 text

Results - Masked rejection vs. Lemire Best case performance is slightly worse. Reduces average execution time by 54%. Reduces worst case execution time by 73%.

Slide 93

Slide 93 text

Results - Approx. avrg. case With Lemire rejection Speed-up relative to NumPy MT 32-bit MT19937 85.0% PCG32 200.7% PCG64 188.3% ThreeFry 71.5% Xoroshiro128 258.0% Xorshift1024 209.6% Speed-up relative to NumPy MT 64-bit MT19937 96.8% PCG32 215.6% PCG64 260.5% ThreeFry 61.7% Xoroshiro128 396.9% Xorshift1024 331.8% Original Masked Rejection Speed-up relative to NumPy MT 32-bit: MT19937 -2.6% PCG32 46.5% PCG64 33.2% ThreeFry -12.1% Xoroshiro128 65.8% Xorshift1024 41.3% Speed-up relative to NumPy MT 64-bit: MT19937 -4.0% PCG32 29.5% PCG64 25.0% ThreeFry -5.4% Xoroshiro128 56.0% Xorshift1024 30.9%

Slide 94

Slide 94 text

Results - Confirmation http://www.pcg-random.org/posts/bounded-rands.html - 2018-07-22 "From our benchmarks, we can see that switching from a commonly- used PRNG (e.g., the 32-bit Mersenne Twister) to a faster PRNG reduced the execution time of our benchmarks by 45%. But switching from a commonly used method for finding a number in a range to our fastest method reduced our benchmark time by about 66%, in other words reducing execution time to one third of the original time."

Slide 95

Slide 95 text

Questions?

Slide 96

Slide 96 text

Questions?