Joshua Thijssen
June 21, 2016
160

# Paradoxes and theorems every developer should know

June 21, 2016

## Transcript

1. 1
Joshua Thijssen
jaytaph
namespace

2. 2
Joshua Thijssen
Consultant and trainer @ NoxLogic
Founder of TechAnalyze.io
Symfony Rainbow Books author
Mastering the SPL author
Email: [email protected]
WWW.TECHANALYZE.IO

3. 3
https://dutchtechrecruitment.nl/
Text

4. Disclaimer:
scientist nor a
mathematician.
4

5. German Tank
Problem
5

6. 6

7. 6
15

8. 7

9. 7
53
72
8
15

10. 8
k = number of elements
m = largest number

11. 72 + (72 / 4) - 1 = 89
9

12. 10
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem

13. 10
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122

14. 10
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122
271

15. 10
Intelligence Statistics Actual
June 1940 1000 169
June 1941 1550 244
August
1942
1550 327
https://en.wikipedia.org/wiki/German_tank_problem
122
271
342

16. 11

17. 11
➡ Data leakage.

18. 11
➡ Data leakage.
➡ User-id's, invoice-id's, etc

19. 11
➡ Data leakage.
➡ User-id's, invoice-id's, etc
➡ Used to approximate the number of
iPhones sold in 2008.

20. 11
➡ Data leakage.
➡ User-id's, invoice-id's, etc
➡ Used to approximate the number of
iPhones sold in 2008.
➡ Calculate approximations of datasets with
(incomplete) information.

21. 12

22. ➡ Avoid (semi) sequential data to be leaked.
➡ Adding randomness and offsets will NOT
solve the issue.
➡ Use UUIDs
(better: timebased short IDs, you don't need UUIDs)
13

23. 14
Collecting (big) data is easy
Analyzing big data is the hard part.

24. Confirmation Bias
15

25. 2 4 6
16
Z={…,−2,−1,0,1,2,…}

26. 21%
17

27. 18
5 8 ? ?
If a card shows an even number on one face,
then its opposite face is blue.

28. < 10%
19

29. 20
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

30. 20
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

31. 20
coke beer 35 17
If you drink beer
then you must be 18 yrs or older.

for social exchange
21

33. hint:
problem" in a more social context.
22

34. BDD
23

35. 24
5 8 ? ?
If a card shows an even number on one face,
then its opposite face is blue.

36. 24
5 8 ? ?
If a card shows an even number on one face,
then its opposite face is blue.

37. 24
5 8 ? ?
If a card shows an even number on one face,
then its opposite face is blue.

38. TESTING
25

39. 26
➡ Step 1: Write code
➡ Step 2: Write tests
➡ Step 3: Proﬁt

40. public function isLeapYeap(\$year) {
return (\$year % 4 == 0);
}
27
https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
testIs1996ALeapYeap();
testIs2000ALeapYeap();
testIs2004ALeapYeap();
testIs2008ALeapYeap();
testIs2012ALeapYeap();
testIs1997NotALeapYear();
testIs1998NotALeapYear();
testIs2001NotALeapYear();
testIs2013NotALeapYear();

41. public function isLeapYeap(\$year) {
return (\$year % 4 == 0);
}
27
https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
testIs1996ALeapYeap();
testIs2000ALeapYeap();
testIs2004ALeapYeap();
testIs2008ALeapYeap();
testIs2012ALeapYeap();
testIs1997NotALeapYear();
testIs1998NotALeapYear();
testIs2001NotALeapYear();
testIs2013NotALeapYear();

42. public function isLeapYeap(\$year) {
return (\$year % 4 == 0);
}
28
https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

43. 29
➡ Tests where written based on actual code.
➡ Tests where written to CONFIRM actual
code, not to DISPROVE actual code!

44. 30
TDD

45. 31
➡ Step 1: Write tests
➡ Step 2: Write code
➡ Step 3: Proﬁt, as less prone to conﬁrmation
bias (as there is nothing to bias!)

32

47. Question:
33
> 50% chance
4 march
18 september
5 december
25 juli
2 februari
9 october

48. 23 people
34

49. 366 persons = 100%
35

50. Collisions occur more
often than you realize
36

51. Hash collisions
37

52. 16 bits means
300 values before >50%
collision probability
38

53. Watch out for:
39
➡ Too small hashes.
➡ Unique data.
➡ Your data might be less "protected" as
you might think.

54. Heisenberg
uncertainty
principle
40

star trek
(heisenberg compensators)
41

56. nor crystal meth
42

57. 43
x position
p momentum (mass x velocity)
ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

58. The more precise you
know one property, the
less you know the other.
44

observing!
45

60. Observer effect
46
heisenbug

47

62. Benford's law
48

63. Numbers beginning with 1 are
more common than numbers
beginning with 9.
49

64. Default behavior for
natural numbers.
50

65. 51

66. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
52

67. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
52
1073 1
886 2
636 3
372 4
352 5
350 6
307 7
247 8
222 9

68. 53

69. Bayesian filtering
54

70. What's the probability of an
event, based on conditions that
might be related to the event.
55

71. What is the chance that a
message is spam when it
contains certain words?
56

72. 57
P(A|B)
P(A)
P(B)
P(B|A)
Probability event A, if event B (conditional)
Probability event A
Probability event B
Probability event B, if event A

73. 58
➡ Figure out the probability a {mail, tweet,
comment, review} is {spam, negative} etc.

74. ➡ 10 out of 50 comments are "negative".
➡ 25 out of 50 comments uses the word
"horrible".
➡ 8 comments with the word "horrible" are
marked as "negative".
59

75. 60
negative
"horrible"

76. 61

77. 62
➡ More words?
➡ Complex algorithm,
➡ but, we can assume that words are not
independent from eachother
➡ Naive Bayes approach

78. 63

79. 64
We must know
beforehand which
negative?

80. TRAINING SET
65

81. 66
"Your product is horrible and does
not work properly. Also, you suck."
"I had a horrible experience with
another product. But yours really
worked well. Thank you!"
Negative:
Positive:

82. 67
➡ You might want to ﬁlter stop-words ﬁrst.
➡ You might want to make sure negatives are
handled property "not great" => negative.
➡ Bonus points if you can spot sarcasm.

83. ➡ Collaborative ﬁltering (mahout):
➡ If user likes product A, B and C, what is the
chance that they like product D?
68

84. 69
Mess up your (training) data, and nothing can save you
(except a training set reboot)

85. 70
➡ 30% change of acceptance for CFP
➡ 5 CFP's
Binomial probability

86. 70
➡ 30% change of acceptance for CFP
➡ 5 CFP's
1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832
83% on getting selected at least once!
Binomial probability

87. http://farm1.static.ﬂickr.com/73/163450213_18478d3aa6_d.jpg 71

88. 72