October 09, 2016
# Paradoxes and theorems every developer should know

## Transcript

every developer should know

Disclaimer:
scientist nor a
mathematician.
German Tank
Problem
15

53
72
8
15

k = number of elements
m = largest number

72 + (72 / 4) - 1 = 89
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem

14. @jaytaph 9
➡ Data leakage.

15. @jaytaph 9
➡ Data leakage.
➡ User-id's, invoice-id's, etc

16. @jaytaph 9
➡ Data leakage.
➡ User-id's, invoice-id's, etc
➡ Used to approximate the number of
iPhones sold in 2008.

Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976

Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976
Estimated subscriptions
Estimated subscriptions
Estimated subscriptions
Estimated subscriptions
Jan
Feb 8242 12588
Mar 8695 12967
Apr 9420 13600
May 9811 13971
Jun 9989 14525
Jul 10394 15007
Aug 6725 10688 15347
Sep 7057 10969 15712
Oct 7333 11251 15984
Nov 7520 11520 16337
Dec 7995 11821 16635

Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976
Estimated growth / size
Estimated growth / size
Estimated growth / size
Estimated growth / size
Jan
Feb
Mar 105% 103%
Apr 108% 105%
May 104% 103%
Jun 102% 104%
Jul 104% 103%
Aug 103% 102%
Sep 105% 103% 102%
Oct 104% 103% 102%
Nov 103% 102% 102%
Dec 106% 103% 102%

➡ Avoid (semi) sequential data to be leaked.
➡ Adding randomness and offsets will NOT
solve the issue.
➡ Use UUIDs
(better: timebased short IDs, you don't need UUIDs)
Confirmation Bias
Hypothesis....

Evidence!

Hypothesis confirmed!

2 4 6
19
Z={…,−2,−1,0,1,2,…}

21%
5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.

< 10%
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

for social exchange
hint:
problem" in a more social context.
25

5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.

5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.

5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.

Question:
28
> 50% chance
4 march
18 september
5 december
25 juli
2 februari
9 october

23 people
366* persons = 100%
Collisions occur more
often than you realize
Hash collisions
16 bit value
300 elements
33

rand(1,100000)
117 elements
Watch out for:
35
➡ Too small hashes.
➡ Unique data.
➡ Your data might be less "protected" as
you might think.

Heisenberg
uncertainty
principle
x position
p momentum (mass x velocity)
ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

The more precise you
know one property, the
less you know the other.
observing!
41

Observer effect
42
heisenbug

Benford's law
Numbers beginning with 1 are
more common than numbers
beginning with 9.
Default behavior for
natural numbers.
find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
60. @jaytaph
find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
48
1073 1
886 2
636 3
372 4
352 5
350 6
307 7
247 8
222 9

Bayesian filtering
What's the probability of an
event, based on conditions that
might be related to the event.
What is the chance that a
message is spam when it
contains certain words?
P(A|B)
P(A)
P(B)
P(B|A)
Probability event A, if event B (conditional)
Probability event A
Probability event B
Probability event B, if event A

➡ Figure out the probability a {mail, tweet,
comment, review} is {spam, negative} etc.

➡ 10 out of 50 comments are "negative".
➡ 25 out of 50 comments uses the word
"horrible".
➡ 8 comments with the word "horrible" are
marked as "negative".
"Your product is horrible and does
not work properly. Also, you suck."
"I had a horrible experience with
another product. But yours really
worked well. Thank you!"
Negative:
Positive:

60