Paradoxes and theorems every developer should know

1 Joshua Thijssen jaytaph <?php namespace

Disclaimer: I'm not a (mad) scientist nor a mathematician. 2

Second disclaimer: I will only tell lies 3

German Tank Problem 4

6 53 72 8 15

7 k = number of elements m = largest number

72 + (72 / 4) - 1 = 89 8

9 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342

10 ➡ Data leakage.

10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc

10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008.

10 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008. ➡ Calculate approximations of datasets with (incomplete) information.

➡ Avoid (semi) sequential data to be leaked. ➡ Adding
randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 11

12 Collecting (big) data is easy Analyzing big data is
the hard part.

Confirmation Bias 13

2 4 6 14 Z={…,−2,−1,0,1,2,…}

21% 15

16 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.

< 10% 17

18 coke beer 35 17 If you drink beer then
you must be 18 yrs or older.

Cognitive Adaption for social exchange 19

hint: Try and place your "technical problem" in a more
social context. 20

BDD 21

22 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.

TESTING 23

24 ➡ Step 1: Write code ➡ Step 2: Write
tests ➡ Step 3: Proﬁt

public function isLeapYeap($year) { return ($year % 4 == 0);
} 25 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();

public function isLeapYeap($year) { return ($year % 4 == 0);
} 26 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

27 ➡ Tests where written based on actual code. ➡
Tests where written to CONFIRM actual code, not to DISPROVE actual code!

28 TDD

29 ➡ Step 1: Write tests ➡ Step 2: Write
code ➡ Step 3: Proﬁt, as less prone to conﬁrmation bias (as there is nothing to bias!)

Birthday paradox 30

Question: 31 > 50% chance 4 march 18 september 5
december 25 juli 2 februari 9 october

23 people 32

366 persons = 100% 33

Collisions occur more often than you realize 34

Hash collisions 35

16 bits means 300 values before >50% collision probability 36

Watch out for: 37 ➡ Too small hashes. ➡ Unique
data. ➡ Your data might be less "protected" as you might think.

Heisenberg uncertainty principle 38

It's not about star trek (heisenberg compensators) 39

nor crystal meth 40

41 x position p momentum (mass x velocity) ħ 0.0000000000000000000000000000000001054571800
(1.054571800E-34)

The more precise you know one property, the less you
know the other. 42

This is NOT about observing! 43

Observer effect 44 heisenbug

It's about trade-offs 45

Benford's law 46

Numbers beginning with 1 are more common than numbers beginning
with 9. 47

Default behavior for natural numbers. 48

find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 50

find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 50 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9

Bayesian filtering 52

What's the probability of an event, based on conditions that
might be related to the event. 53

What is the chance that a message is spam when
it contains certain words? 54

55 P(A|B) P(A) P(B) P(B|A) Probability event A, if event
B (conditional) Probability event A Probability event B Probability event B, if event A

56 ➡ Figure out the probability a {mail, tweet, comment,
review} is {spam, negative} etc.

➡ 10 out of 50 comments are "negative". ➡ 25
out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 57

58 negative "horrible" 10 comments 25 comments 8 comments

60 ➡ More words? ➡ Complex algorithm, ➡ but, we
can assume that words are not independent from eachother ➡ Naive Bayes approach

62 We must know beforehand which comments are negative?

TRAINING SET 63

64 "Your product is horrible and does not work properly.
Also, you suck." "I had a horrible experience with another product. But yours really worked well. Thank you!" Negative: Positive:

$trainingset = [ 'negative' => [ 'count' => 1, 'words'
=> [ 'product' => 1, 'horrible' => 1, 'properly' => 1, 'suck' => 1, ], ], 'positive' => [ 'count' => 1, 'words' => [ 'horrible' => 1, 'experience' => 1, 'product' => 1, 'thank' => 1, ], ], ]; 65

66 $trainingset = [ 'negative' => [ 'count' => 631,
'words' => [ 'product' => 521, 'horrible' => 52, 'properly' => 36, 'suck' => 272, ], ], 'positive' => [ 'count' => 1263, 'words' => [ 'horrible' => 62, 'experience' => 16, 'product' => 311, 'great' => 363 'thank' => 63, ], ], ];

67 ➡ You might want to ﬁlter stop-words ﬁrst. ➡
You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.

➡ Collaborative ﬁltering (mahout): ➡ If user likes product A,
B and C, what is the chance that they like product D? 68

69 Mess up your (training) data, and nothing can save
you (except a training set reboot)

➡ Binomial probability 70

71 ➡ 30% change of acceptance for CFP ➡ 5
CFP's

71 ➡ 30% change of acceptance for CFP ➡ 5
CFP's 1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832 83% on getting selected at least once!

Ockham's Razor 72

73 Among competing hypotheses, the one with the fewest assumptions
should be selected.

74 82 Everything should be made as simple as possible,
but no simpler.

YAGNI 75

76 Actually, ➡ The principle of plurality Plurality should not
be posited with necessity. ➡ The principle of parsimony It is pointless to do more with what is done with less.

➡ Every element you add needs: design, development, maintenance, connectivity,
support, etc etc. ➡ When "adding" elements, you are not adding, you are multiplying! 77

78 Food for thought: Would Ockham accept a Service Oriented
Architecture?

http://farm1.static.ﬂickr.com/73/163450213_18478d3aa6_d.jpg 79

80 Find me on twitter: @jaytaph Find me for development
and training: www.noxlogic.nl / www.techademy.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl

Paradoxes and theorems every developer should know

Paradoxes and theorems every developer should know

More Decks by Joshua Thijssen

Featured

Transcript