Slide 1

Slide 1 text

@jaytaph 1 Joshua Thijssen jaytaph Paradoxes and theorems every developer should know

Slide 2

Slide 2 text

@jaytaph Disclaimer: I'm not a (mad) scientist nor a mathematician. 2

Slide 3

Slide 3 text

@jaytaph German Tank Problem 3

Slide 4

Slide 4 text

@jaytaph 4 15

Slide 5

Slide 5 text

@jaytaph 5

Slide 6

Slide 6 text

@jaytaph 5 53 72 8 15

Slide 7

Slide 7 text

@jaytaph 6 k = number of elements m = largest number

Slide 8

Slide 8 text

@jaytaph 72 + (72 / 4) - 1 = 89 7

Slide 9

Slide 9 text

@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem

Slide 10

Slide 10 text

@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122

Slide 11

Slide 11 text

@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271

Slide 12

Slide 12 text

@jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June 1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342

Slide 13

Slide 13 text

@jaytaph 9

Slide 14

Slide 14 text

@jaytaph 9 ➡ Data leakage.

Slide 15

Slide 15 text

@jaytaph 9 ➡ Data leakage. ➡ User-id's, invoice-id's, etc

Slide 16

Slide 16 text

@jaytaph 9 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used to approximate the number of iPhones sold in 2008.

Slide 17

Slide 17 text

@jaytaph 10 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976

Slide 18

Slide 18 text

@jaytaph 11 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976 Estimated subscriptions Estimated subscriptions Estimated subscriptions Estimated subscriptions Jan Feb 8242 12588 Mar 8695 12967 Apr 9420 13600 May 9811 13971 Jun 9989 14525 Jul 10394 15007 Aug 6725 10688 15347 Sep 7057 10969 15712 Oct 7333 11251 15984 Nov 7520 11520 16337 Dec 7995 11821 16635

Slide 19

Slide 19 text

@jaytaph 12 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976 Estimated growth / size Estimated growth / size Estimated growth / size Estimated growth / size Jan Feb Mar 105% 103% Apr 108% 105% May 104% 103% Jun 102% 104% Jul 104% 103% Aug 103% 102% Sep 105% 103% 102% Oct 104% 103% 102% Nov 103% 102% 102% Dec 106% 103% 102%

Slide 20

Slide 20 text

@jaytaph ➡ Avoid (semi) sequential data to be leaked. ➡ Adding randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13

Slide 21

Slide 21 text

@jaytaph Confirmation Bias 14

Slide 22

Slide 22 text

@jaytaph 15 Hypothesis....

Slide 23

Slide 23 text

@jaytaph 16 Evidence!

Slide 24

Slide 24 text

@jaytaph 17 Hypothesis confirmed!

Slide 25

Slide 25 text

@jaytaph 18

Slide 26

Slide 26 text

@jaytaph 2 4 6 19 Z={…,−2,−1,0,1,2,…}

Slide 27

Slide 27 text

@jaytaph 21% 20

Slide 28

Slide 28 text

@jaytaph Don't try and confirm what you know. Try and disprove instead. 21

Slide 29

Slide 29 text

@jaytaph Confirmation bias is everywhere! 22

Slide 30

Slide 30 text

@jaytaph 23 5 8 ? ? If a card shows an even number on one face, then its opposite face must be blue.

Slide 31

Slide 31 text

@jaytaph < 10% 24

Slide 32

Slide 32 text

@jaytaph 25 coke beer 35 17 If you drink beer then you must be 18 yrs or older.

Slide 33

Slide 33 text

@jaytaph 25 coke beer 35 17 If you drink beer then you must be 18 yrs or older.

Slide 34

Slide 34 text

@jaytaph 25 coke beer 35 17 If you drink beer then you must be 18 yrs or older.

Slide 35

Slide 35 text

@jaytaph Cognitive Adaption for social exchange 26

Slide 36

Slide 36 text

@jaytaph hint: Try and place your "technical problem" in a more social context. 27

Slide 37

Slide 37 text

@jaytaph 28 5 8 ? ? If a card shows an even number on one face, then its opposite face must be blue.

Slide 38

Slide 38 text

@jaytaph 28 5 8 ? ? If a card shows an even number on one face, then its opposite face must be blue.

Slide 39

Slide 39 text

@jaytaph 28 5 8 ? ? If a card shows an even number on one face, then its opposite face must be blue.

Slide 40

Slide 40 text

@jaytaph 29 TDD

Slide 41

Slide 41 text

@jaytaph Birthday paradox 30

Slide 42

Slide 42 text

@jaytaph Question: 31 > 50% chance 4 march 18 september 5 december 25 juli 2 februari 9 october

Slide 43

Slide 43 text

@jaytaph 23 people 32

Slide 44

Slide 44 text

@jaytaph 366* persons = 100% 33

Slide 45

Slide 45 text

@jaytaph Collisions occur more often than you realize 34

Slide 46

Slide 46 text

@jaytaph Hash collisions 35

Slide 47

Slide 47 text

@jaytaph 16 bit value 300 elements 36

Slide 48

Slide 48 text

@jaytaph random_int(1,100000) how many attempts before 50% collision chance? 37

Slide 49

Slide 49 text

@jaytaph random_int(1,100000) 117 elements 38

Slide 50

Slide 50 text

@jaytaph Watch out for: 39 ➡ Too small hashes. ➡ Unique data. ➡ Your data might be less "protected" as you might think.

Slide 51

Slide 51 text

@jaytaph Heisenberg uncertainty principle 40

Slide 52

Slide 52 text

@jaytaph 41

Slide 53

Slide 53 text

@jaytaph 42

Slide 54

Slide 54 text

@jaytaph 43 x position p momentum (mass x velocity) ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

Slide 55

Slide 55 text

@jaytaph The more precise you know one property, the less you know the other. 44

Slide 56

Slide 56 text

@jaytaph It's about trade-offs 45

Slide 57

Slide 57 text

@jaytaph This is NOT about observing! 46

Slide 58

Slide 58 text

@jaytaph Observer effect 47 heisenbug

Slide 59

Slide 59 text

@jaytaph Benford's law 48

Slide 60

Slide 60 text

@jaytaph Numbers beginning with 1 are more common than numbers beginning with 9. 49

Slide 61

Slide 61 text

@jaytaph Default behavior for natural numbers. 50

Slide 62

Slide 62 text

@jaytaph 51

Slide 63

Slide 63 text

@jaytaph find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c 52

Slide 64

Slide 64 text

@jaytaph find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c 52 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9

Slide 65

Slide 65 text

@jaytaph 53

Slide 66

Slide 66 text

@jaytaph Bayesian filtering 54

Slide 67

Slide 67 text

@jaytaph What's the probability of an event, based on conditions that might be related to the event. 55

Slide 68

Slide 68 text

@jaytaph What is the chance that a message is spam when it contains certain words? 56

Slide 69

Slide 69 text

@jaytaph 57 P(A|B) P(A) P(B) P(B|A) Probability event A, if event B (conditional) Probability event A Probability event B Probability event B, if event A

Slide 70

Slide 70 text

@jaytaph 58 ➡ Figure out the probability a {mail, tweet, comment, review} is {spam, negative} etc.

Slide 71

Slide 71 text

@jaytaph ➡ 10 out of 50 comments are "negative". ➡ 25 out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59

Slide 72

Slide 72 text

@jaytaph 60 10 comments 25 comments 8 comments negative "horrible"

Slide 73

Slide 73 text

@jaytaph 61

Slide 74

Slide 74 text

@jaytaph 62

Slide 75

Slide 75 text

@jaytaph 63 ➡ You might want to filter stop-words first. ➡ You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.

Slide 76

Slide 76 text

@jaytaph ➡ Collaborative filtering (mahout): ➡ If user likes product A, B and C, what is the chance that they like product D? 64

Slide 77

Slide 77 text

@jaytaph 65 Mess up your (training) data, and nothing can save you (except a training set reboot)

Slide 78

Slide 78 text

@jaytaph 66

Slide 79

Slide 79 text

@jaytaph 67 Find me on twitter: @jaytaph Find me for development and training: www.noxlogic.nl / www.techademy.nl Find me on email: [email protected]