Paradoxes and theorems every developer should know

1 Joshua Thijssen jaytaph <?php namespace

2 Joshua Thijssen Consultant and trainer @ NoxLogic Founder of
TechAnalyze.io Symfony Rainbow Books author Mastering the SPL author Blog: http://adayinthelifeof.nl Email: [email protected] Twitter: @jaytaph Tech nalyze WWW.TECHANALYZE.IO

3 https://dutchtechrecruitment.nl/ Text

Disclaimer: I'm not a (mad) scientist nor a mathematician. 4

German Tank Problem 5

7 53 72 8 15

8 k = number of elements m = largest number

72 + (72 / 4) - 1 = 89 9

10 Intelligence Statistics Actual June 1940 1000 169 June 1941
1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271

1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342

11 ➡ Data leakage.

11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc

11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008.

11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used
to approximate the number of iPhones sold in 2008. ➡ Calculate approximations of datasets with (incomplete) information.

➡ Avoid (semi) sequential data to be leaked. ➡ Adding
randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13

14 Collecting (big) data is easy Analyzing big data is
the hard part.

Confirmation Bias 15

2 4 6 16 Z={…,−2,−1,0,1,2,…}

21% 17

18 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.

< 10% 19

20 coke beer 35 17 If you drink beer then
you must be 18 yrs or older.

Cognitive Adaption for social exchange 21

hint: Try and place your "technical problem" in a more
social context. 22

BDD 23

24 5 8 ? ? If a card shows an
even number on one face, then its opposite face is blue.

TESTING 25

26 ➡ Step 1: Write code ➡ Step 2: Write
tests ➡ Step 3: Proﬁt

public function isLeapYeap($year) { return ($year % 4 == 0);
} 27 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();

public function isLeapYeap($year) { return ($year % 4 == 0);
} 28 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

29 ➡ Tests where written based on actual code. ➡
Tests where written to CONFIRM actual code, not to DISPROVE actual code!

30 TDD

31 ➡ Step 1: Write tests ➡ Step 2: Write
code ➡ Step 3: Proﬁt, as less prone to conﬁrmation bias (as there is nothing to bias!)

Birthday paradox 32

Question: 33 > 50% chance 4 march 18 september 5
december 25 juli 2 februari 9 october

23 people 34

366 persons = 100% 35

Collisions occur more often than you realize 36

Hash collisions 37

16 bits means 300 values before >50% collision probability 38

Watch out for: 39 ➡ Too small hashes. ➡ Unique
data. ➡ Your data might be less "protected" as you might think.

Heisenberg uncertainty principle 40

It's not about star trek (heisenberg compensators) 41

nor crystal meth 42

43 x position p momentum (mass x velocity) ħ 0.0000000000000000000000000000000001054571800
(1.054571800E-34)

The more precise you know one property, the less you
know the other. 44

This is NOT about observing! 45

Observer effect 46 heisenbug

It's about trade-offs 47

Benford's law 48

Numbers beginning with 1 are more common than numbers beginning
with 9. 49

Default behavior for natural numbers. 50

find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 52

find . -name \*.php -exec wc -l {} \; |
sort | cut -b 1 | uniq -c 52 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9

Bayesian filtering 54

What's the probability of an event, based on conditions that
might be related to the event. 55

What is the chance that a message is spam when
it contains certain words? 56

57 P(A|B) P(A) P(B) P(B|A) Probability event A, if event
B (conditional) Probability event A Probability event B Probability event B, if event A

58 ➡ Figure out the probability a {mail, tweet, comment,
review} is {spam, negative} etc.

➡ 10 out of 50 comments are "negative". ➡ 25
out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59

60 negative "horrible" 10 comments 25 comments 8 comments

62 ➡ More words? ➡ Complex algorithm, ➡ but, we
can assume that words are not independent from eachother ➡ Naive Bayes approach

64 We must know beforehand which comments are negative?

TRAINING SET 65

66 "Your product is horrible and does not work properly.
Also, you suck." "I had a horrible experience with another product. But yours really worked well. Thank you!" Negative: Positive:

67 ➡ You might want to ﬁlter stop-words ﬁrst. ➡
You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.

➡ Collaborative ﬁltering (mahout): ➡ If user likes product A,
B and C, what is the chance that they like product D? 68

69 Mess up your (training) data, and nothing can save
you (except a training set reboot)

70 ➡ 30% change of acceptance for CFP ➡ 5
CFP's Binomial probability

70 ➡ 30% change of acceptance for CFP ➡ 5
CFP's 1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832 83% on getting selected at least once! Binomial probability

http://farm1.static.ﬂickr.com/73/163450213_18478d3aa6_d.jpg 71

72 Find me on twitter: @jaytaph Find me for development
and training: www.noxlogic.nl / www.techademy.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl

Paradoxes and theorems every developer should know

Paradoxes and theorems every developer should know

More Decks by Joshua Thijssen

Other Decks in Technology

Featured

Transcript