@jaytaph 1
Joshua Thijssen
jaytaph
Paradoxes and theorems
every developer should know
Slide 2
Slide 2 text
@jaytaph
Disclaimer:
I'm not a (mad)
scientist nor a
mathematician.
2
Slide 3
Slide 3 text
@jaytaph
German Tank
Problem
3
Slide 4
Slide 4 text
@jaytaph 4
15
Slide 5
Slide 5 text
@jaytaph 5
Slide 6
Slide 6 text
@jaytaph 5
53
72
8
15
Slide 7
Slide 7 text
@jaytaph 6
k = number of elements
m = largest number
Slide 8
Slide 8 text
@jaytaph
72 + (72 / 4) - 1 = 89
7
Slide 9
Slide 9 text
@jaytaph 8
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
Slide 10
Slide 10 text
@jaytaph 8
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122
Slide 11
Slide 11 text
@jaytaph 8
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122
271
Slide 12
Slide 12 text
@jaytaph 8
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122
271
342
Slide 13
Slide 13 text
@jaytaph 9
Slide 14
Slide 14 text
@jaytaph 9
➡ Data leakage.
Slide 15
Slide 15 text
@jaytaph 9
➡ Data leakage.
➡ User-id's, invoice-id's, etc
Slide 16
Slide 16 text
@jaytaph 9
➡ Data leakage.
➡ User-id's, invoice-id's, etc
➡ Used to approximate the number of
iPhones sold in 2008.
Slide 17
Slide 17 text
@jaytaph 10
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976
Slide 18
Slide 18 text
@jaytaph 11
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976
Estimated subscriptions
Estimated subscriptions
Estimated subscriptions
Estimated subscriptions
Jan
Feb 8242 12588
Mar 8695 12967
Apr 9420 13600
May 9811 13971
Jun 9989 14525
Jul 10394 15007
Aug 6725 10688 15347
Sep 7057 10969 15712
Oct 7333 11251 15984
Nov 7520 11520 16337
Dec 7995 11821 16635
Slide 19
Slide 19 text
@jaytaph 12
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Monthly Invoice IDs
Jan 2476 2303
Feb 10718 14891
Mar 19413 27858
Apr 28833 41458
May 38644 55429
Jun 48633 55429
Jul 102606 59027 84961
Aug 109331 69715 100308
Sep 116388 80684 116020
Oct 123721 91935 132004
Nov 131241 103455 148341
Dec 139236 115276 164976
Estimated growth / size
Estimated growth / size
Estimated growth / size
Estimated growth / size
Jan
Feb
Mar 105% 103%
Apr 108% 105%
May 104% 103%
Jun 102% 104%
Jul 104% 103%
Aug 103% 102%
Sep 105% 103% 102%
Oct 104% 103% 102%
Nov 103% 102% 102%
Dec 106% 103% 102%
Slide 20
Slide 20 text
@jaytaph
➡ Avoid (semi) sequential data to be leaked.
➡ Adding randomness and offsets will NOT
solve the issue.
➡ Use UUIDs
(better: timebased short IDs, you don't need UUIDs)
13
Slide 21
Slide 21 text
@jaytaph
Confirmation Bias
14
Slide 22
Slide 22 text
@jaytaph 15
Hypothesis....
Slide 23
Slide 23 text
@jaytaph 16
Evidence!
Slide 24
Slide 24 text
@jaytaph 17
Hypothesis confirmed!
Slide 25
Slide 25 text
@jaytaph 18
Slide 26
Slide 26 text
@jaytaph
2 4 6
19
Z={…,−2,−1,0,1,2,…}
Slide 27
Slide 27 text
@jaytaph
21%
20
Slide 28
Slide 28 text
@jaytaph
Don't try and confirm what you know.
Try and disprove instead.
21
Slide 29
Slide 29 text
@jaytaph
Confirmation bias is everywhere!
22
Slide 30
Slide 30 text
@jaytaph 23
5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.
Slide 31
Slide 31 text
@jaytaph
< 10%
24
Slide 32
Slide 32 text
@jaytaph 25
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.
Slide 33
Slide 33 text
@jaytaph 25
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.
Slide 34
Slide 34 text
@jaytaph 25
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.
Slide 35
Slide 35 text
@jaytaph
Cognitive Adaption
for social exchange
26
Slide 36
Slide 36 text
@jaytaph
hint:
Try and place your "technical
problem" in a more social context.
27
Slide 37
Slide 37 text
@jaytaph 28
5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.
Slide 38
Slide 38 text
@jaytaph 28
5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.
Slide 39
Slide 39 text
@jaytaph 28
5 8 ? ?
If a card shows an even number on one face,
then its opposite face must be blue.
Slide 40
Slide 40 text
@jaytaph 29
TDD
Slide 41
Slide 41 text
@jaytaph
Birthday paradox
30
Slide 42
Slide 42 text
@jaytaph
Question:
31
> 50% chance
4 march
18 september
5 december
25 juli
2 februari
9 october
Slide 43
Slide 43 text
@jaytaph
23 people
32
Slide 44
Slide 44 text
@jaytaph
366* persons = 100%
33
Slide 45
Slide 45 text
@jaytaph
Collisions occur more
often than you realize
34
Slide 46
Slide 46 text
@jaytaph
Hash collisions
35
Slide 47
Slide 47 text
@jaytaph
16 bit value
300 elements
36
Slide 48
Slide 48 text
@jaytaph
random_int(1,100000)
how many attempts before
50% collision chance?
37
Slide 49
Slide 49 text
@jaytaph
random_int(1,100000)
117 elements
38
Slide 50
Slide 50 text
@jaytaph
Watch out for:
39
➡ Too small hashes.
➡ Unique data.
➡ Your data might be less "protected" as
you might think.
Slide 51
Slide 51 text
@jaytaph
Heisenberg
uncertainty
principle
40
Slide 52
Slide 52 text
@jaytaph 41
Slide 53
Slide 53 text
@jaytaph 42
Slide 54
Slide 54 text
@jaytaph 43
x position
p momentum (mass x velocity)
ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)
Slide 55
Slide 55 text
@jaytaph
The more precise you
know one property, the
less you know the other.
44
Slide 56
Slide 56 text
@jaytaph
It's about trade-offs
45
Slide 57
Slide 57 text
@jaytaph
This is NOT about
observing!
46
Slide 58
Slide 58 text
@jaytaph
Observer effect
47
heisenbug
Slide 59
Slide 59 text
@jaytaph
Benford's law
48
Slide 60
Slide 60 text
@jaytaph
Numbers beginning with 1 are
more common than numbers
beginning with 9.
49
@jaytaph
What's the probability of an
event, based on conditions that
might be related to the event.
55
Slide 68
Slide 68 text
@jaytaph
What is the chance that a
message is spam when it
contains certain words?
56
Slide 69
Slide 69 text
@jaytaph 57
P(A|B)
P(A)
P(B)
P(B|A)
Probability event A, if event B (conditional)
Probability event A
Probability event B
Probability event B, if event A
Slide 70
Slide 70 text
@jaytaph 58
➡ Figure out the probability a {mail, tweet,
comment, review} is {spam, negative} etc.
Slide 71
Slide 71 text
@jaytaph
➡ 10 out of 50 comments are "negative".
➡ 25 out of 50 comments uses the word
"horrible".
➡ 8 comments with the word "horrible" are
marked as "negative".
59
@jaytaph 63
➡ You might want to filter stop-words first.
➡ You might want to make sure negatives are
handled property "not great" => negative.
➡ Bonus points if you can spot sarcasm.
Slide 76
Slide 76 text
@jaytaph
➡ Collaborative filtering (mahout):
➡ If user likes product A, B and C, what is the
chance that they like product D?
64
Slide 77
Slide 77 text
@jaytaph 65
Mess up your (training) data, and nothing can save you
(except a training set reboot)
Slide 78
Slide 78 text
@jaytaph 66
Slide 79
Slide 79 text
@jaytaph
67
Find me on twitter: @jaytaph
Find me for development and training:
www.noxlogic.nl / www.techademy.nl
Find me on email: [email protected]