Randomness in Python:
Controlled Chaos in an Ordered
by @amandasopkin
Makes processes secure
biologically, philosophically important
Difficult to actually achieve
Why do we need
Problems with randomness
The seed, or starting point The algorithm
1. Determined that user ids were seeded
with restart time
2. Crashed the Hacker News site
3. Predicted restart time
4. Predicted assigned user ids as users
logged in
5. Impersonated discovered users
● 08/2007: Shumow and Ferguson present
Dual_EC_DRBG flaw at cryptography conference
DUAL_EC_DRBG Controversy
● 11/2007: Schneier bases article in Wired on
their findings
DUAL_EC_DRBG Controversy
“...would allow NSA to determine the
state of the random number
generator, and thereby eventually be
able to read all data sent over the
SSL connection.”
DUAL_EC_DRBG Controversy
● 09/2013: One of the purposes of Bullrun is
described as being "to covertly introduce
weaknesses into the encryption standards
followed by hardware and software developers
around the world."
DUAL_EC_DRBG Controversy
● NIST recommends removal of the algorithm as a
DUAL_EC_DRBG Controversy
● 2004: Dual EC PRNG introduced
● 08/2007: Shumow and Ferguson present Dual_EC_DRBG
flaw at cryptography conference
● 11/2007: Schneier bases article in Wired on their
DUAL_EC_DRBG Controversy
● 09/2013: One of the purposes of Bullrun is
described as being "to covertly introduce
weaknesses into the encryption standards followed
by hardware and software developers around the
● 12/2013: Presidential advisory examines encryption
● 2014: Standard is removed
DUAL_EC_DRBG Controversy
Years until standard removed...
Who did this impact?
Microsoft, Google, Apple, McAfee,
Docker, IBM, Oracle, Cisco, VMWare,
Juniper, HP, Red Hat, Samsung,
Toshiba, DELL, Ruckus, F5 Networks,
Lenovo, Nokia, the RSA BSAFE
libraries for Java and C++ and
Ok, so you want to
create randomness...
An ideal pseudo random number generator
1. Pass statistical tests of randomness
An ideal pseudo random number generator
Monobit Distance Poker or
1. Pass statistical tests of randomness
2. Take a long time before repeating
An ideal pseudo random number generator
Have a long “period”
1. Pass statistical tests of randomness
2. Take a long time before repeating
3. Execute efficiently
An ideal pseudo random number generator
Quick Low storage
1. Pass statistical tests of randomness
2. Take a long time before repeating
3. Execute efficiently
4. Be repeatable
An ideal pseudo random number generator
1. Pass statistical tests of randomness
2. Take a long time before repeating
3. Execute efficiently
4. Be repeatable
5. Be portable
An ideal pseudo random number generator
Can be run on any machine or system
What are the common
ways of generating
Linear congruential generators
Linear congruential generators take the form
xk = (axk−1 + c) (mod M)
where x0 is the seed, the integer M is the
largest representable integer, and the period
is at most M.
Linear combination generators
a = 3
c = 9
m = 16
xi = 4394
def lcg():
xi = seed()
for i in range(10):
xi = (a*xi + c)%m
Linear combination generators
Algorithm: xi = (a*xi + c)%m
Towards a better
pseudorandom generator
Any one who
arithmetical methods
of producing random
digits is, of
course, in a state
of sin.
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8 digits, add
leading 0s
Take the middle 4 digits of the result
Repeat the sequence
Mid square method generally
Start with a 4 digit seed 9834
Mid square method generally
Start with a 4 digit seed
Square this value 96707556
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Take the middle 4 digits of the
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Mid square method generally
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Take the middle 4 digits of the
Repeat the sequence
Start with a 4 digit seed
Square this value
If the result has fewer than 8
digits, add leading 0s
Mid square method
seed_number = int(input("Please enter a four digit number:\n[####] "))
number = seed_number
already_seen = set()
counter = 0
while number not in already_seen:
counter += 1
number = int(str(number * number).zfill(8)[2:6])
print(f"#{counter}: {number}")
print(f"We began with the seed {seed_number}, and"
f" we repeated ourselves after {counter} steps"
f" with {number}.")
Mid square method
Please enter a four digit number: [####]
#1: 3278
#2: 7452
#3: 5323
#4: 3343
#5: 1756
#6: 835
#7: 6972
#8: 6087
#9: 515
#10: 2652
#59: 24 #60: 5 #61: 0 #62: 0 We began with the seed 5859, and we repeated ourselves after 62 steps
with 0.
Issues with mid square method
Relatively slow
Statistically unsatisfactory
Sample of random numbers may be too short
Predicting the mid square method
Advanced LCG Mid square method
Let’s talk cryptography
Most used pseudo random number generator
Very long period (the Mersenne prime: 219937 − 1)
Not cryptographically secure
The Mersenne Twister
Predicting the random() module
from random import random
import matplotlib.pyplot as plt
def uni(n, m, a, c, seed):
sequence = []
Xn = seed
for i in range(n):
Xn = ((a*Xn + c) % m)
x = range(1000)
y_1 = uni(1000, 2**32, 11695477, 1, datetime.now().microsecond)
y_2 = [random() for i in range(1000)]
plt.plot(x, y_1, "o", color="blue")
plt.plot(x, y_2, "o", color="red")
Predicting the random() module
Advanced LCG Built in Random PRNG
Whats wrong with the
random module?
Problems with the random module...
Problems with the random module...
Problems with the random module...
secrets module!
The Secrets module
Is cryptographically secure
Includes ready made “batteries” for
Users that don’t want to build their own
Uses 32 bytes of entropy by default
A note on entropy...
Natural sources of entropy
Source code of Secrets module
from random import SystemRandom
_sysrand = SystemRandom()
randbits = _sysrand.getrandbits
choice = _sysrand.choice
def randbelow(exclusive_upper_bound):
return _sysrand._randbelow(exclusive_upper_bound)
DEFAULT_ENTROPY = 32 # number of bytes to return by default
def token_bytes(nbytes=None):
if nbytes is None:
return os.urandom(nbytes)
def token_hex(nbytes=None):
return binascii.hexlify(token_bytes(nbytes)).decode('ascii')
def token_urlsafe(nbytes=None):
tok = token_bytes(nbytes)
return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')
Uses OS as a source of randomness
Not available on all systems
Does not rely on software states
Sequences are not repeatable
Will block without sufficient entropy
Relies on “the kernel entropy pool”
Slower than /dev/urandom
Will not block without sufficient entropy
Relies on “the kernel entropy pool”
Faster than /dev/random
Theoretically vulnerable to attack
Using the secrets module to get tokens
import secrets
token1 = secrets.token_hex(16)
token2 = secrets.token_hex(10)
Using the secrets module for password
import secrets
import string
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet)
for i in range(10))
The secrets module:
not the end all be all.
Python’s “nuclear reactor” of
"...folks really are better off learning to use things
like cryptography.io for security sensitive software, so
this change is just about harm mitigation given that it's
inevitable that a non-trivial proportion of the millions
of current and future Python developers won't do that."
Let’s wrap up...
Is very important for security
Difficult to truly achieve
Can be simulated
Thank you!
● Icons taken from flaticon.com
● https://crypto.stackexchange.com/questions/51232/using-
● https://dev.to/walker/pseudo-random-numbers-in-python-f
● Wired Magazine
● The Washington Post
● Dilbert