$30 off During Our Annual Pro Sale. View Details »

Paradoxes and theorems every developer should know

Joshua Thijssen
December 08, 2015
700

Paradoxes and theorems every developer should know

Joshua Thijssen

December 08, 2015
Tweet

Transcript

  1. 1
    Joshua Thijssen
    jaytaph
    namespace

    View Slide

  2. Disclaimer:
    I'm not a (mad)
    scientist nor a
    mathematician.
    2

    View Slide

  3. Second disclaimer:
    I will only tell lies
    3

    View Slide

  4. German Tank
    Problem
    4

    View Slide

  5. 5

    View Slide

  6. 5
    15

    View Slide

  7. 6

    View Slide

  8. 6
    53
    72
    8
    15

    View Slide

  9. 7
    k = number of elements
    m = largest number

    View Slide

  10. 72 + (72 / 4) - 1 = 89
    8

    View Slide

  11. 9
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem

    View Slide

  12. 9
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122

    View Slide

  13. 9
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271

    View Slide

  14. 9
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271
    342

    View Slide

  15. 10

    View Slide

  16. 10
    ➡ Data leakage.

    View Slide

  17. 10
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc

    View Slide

  18. 10
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.

    View Slide

  19. 10
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.
    ➡ Calculate approximations of datasets with
    (incomplete) information.

    View Slide

  20. ➡ Avoid (semi) sequential data to be leaked.
    ➡ Adding randomness and offsets will NOT
    solve the issue.
    ➡ Use UUIDs
    (better: timebased short IDs, you don't need UUIDs)
    11

    View Slide

  21. 12
    Collecting (big) data is easy
    Analyzing big data is the hard part.

    View Slide

  22. Confirmation Bias
    13

    View Slide

  23. 2 4 6
    14
    Z={…,−2,−1,0,1,2,…}

    View Slide

  24. 21%
    15

    View Slide

  25. 16
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  26. < 10%
    17

    View Slide

  27. 18
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  28. 18
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  29. 18
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  30. Cognitive Adaption
    for social exchange
    19

    View Slide

  31. hint:
    Try and place your "technical
    problem" in a more social context.
    20

    View Slide

  32. BDD
    21

    View Slide

  33. 22
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  34. 22
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  35. 22
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  36. TESTING
    23

    View Slide

  37. 24
    ➡ Step 1: Write code
    ➡ Step 2: Write tests
    ➡ Step 3: Profit

    View Slide

  38. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    25
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View Slide

  39. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    25
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View Slide

  40. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    26
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

    View Slide

  41. 27
    ➡ Tests where written based on actual code.
    ➡ Tests where written to CONFIRM actual
    code, not to DISPROVE actual code!

    View Slide

  42. 28
    TDD

    View Slide

  43. 29
    ➡ Step 1: Write tests
    ➡ Step 2: Write code
    ➡ Step 3: Profit, as less prone to confirmation
    bias (as there is nothing to bias!)

    View Slide

  44. Birthday paradox
    30

    View Slide

  45. Question:
    31
    > 50% chance
    4 march
    18 september
    5 december
    25 juli
    2 februari
    9 october

    View Slide

  46. 23 people
    32

    View Slide

  47. 366 persons = 100%
    33

    View Slide

  48. Collisions occur more
    often than you realize
    34

    View Slide

  49. Hash collisions
    35

    View Slide

  50. 16 bits means
    300 values before >50%
    collision probability
    36

    View Slide

  51. Watch out for:
    37
    ➡ Too small hashes.
    ➡ Unique data.
    ➡ Your data might be less "protected" as
    you might think.

    View Slide

  52. Heisenberg
    uncertainty
    principle
    38

    View Slide

  53. It's not about
    star trek
    (heisenberg compensators)
    39

    View Slide

  54. nor crystal meth
    40

    View Slide

  55. 41
    x position
    p momentum (mass x velocity)
    ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

    View Slide

  56. The more precise you
    know one property, the
    less you know the other.
    42

    View Slide

  57. This is NOT about
    observing!
    43

    View Slide

  58. Observer effect
    44
    heisenbug

    View Slide

  59. It's about trade-offs
    45

    View Slide

  60. Benford's law
    46

    View Slide

  61. Numbers beginning with 1 are
    more common than numbers
    beginning with 9.
    47

    View Slide

  62. Default behavior for
    natural numbers.
    48

    View Slide

  63. 49

    View Slide

  64. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    50

    View Slide

  65. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    50
    1073 1
    886 2
    636 3
    372 4
    352 5
    350 6
    307 7
    247 8
    222 9

    View Slide

  66. 51

    View Slide

  67. Bayesian filtering
    52

    View Slide

  68. What's the probability of an
    event, based on conditions that
    might be related to the event.
    53

    View Slide

  69. What is the chance that a
    message is spam when it
    contains certain words?
    54

    View Slide

  70. 55
    P(A|B)
    P(A)
    P(B)
    P(B|A)
    Probability event A, if event B (conditional)
    Probability event A
    Probability event B
    Probability event B, if event A

    View Slide

  71. 56
    ➡ Figure out the probability a {mail, tweet,
    comment, review} is {spam, negative} etc.

    View Slide

  72. ➡ 10 out of 50 comments are "negative".
    ➡ 25 out of 50 comments uses the word
    "horrible".
    ➡ 8 comments with the word "horrible" are
    marked as "negative".
    57

    View Slide

  73. 58
    negative
    "horrible"
    10 comments
    25 comments
    8 comments

    View Slide

  74. 59

    View Slide

  75. 60
    ➡ More words?
    ➡ Complex algorithm,
    ➡ but, we can assume that words are not
    independent from eachother
    ➡ Naive Bayes approach

    View Slide

  76. 61

    View Slide

  77. 62
    We must know
    beforehand which
    comments are
    negative?

    View Slide

  78. TRAINING SET
    63

    View Slide

  79. 64
    "Your product is horrible and does
    not work properly. Also, you suck."
    "I had a horrible experience with
    another product. But yours really
    worked well. Thank you!"
    Negative:
    Positive:

    View Slide

  80. $trainingset = [
    'negative' => [
    'count' => 1,
    'words' => [
    'product' => 1,
    'horrible' => 1,
    'properly' => 1,
    'suck' => 1,
    ],
    ],
    'positive' => [
    'count' => 1,
    'words' => [
    'horrible' => 1,
    'experience' => 1,
    'product' => 1,
    'thank' => 1,
    ],
    ],
    ];
    65

    View Slide

  81. 66
    $trainingset = [
    'negative' => [
    'count' => 631,
    'words' => [
    'product' => 521,
    'horrible' => 52,
    'properly' => 36,
    'suck' => 272,
    ],
    ],
    'positive' => [
    'count' => 1263,
    'words' => [
    'horrible' => 62,
    'experience' => 16,
    'product' => 311,
    'great' => 363
    'thank' => 63,
    ],
    ],
    ];

    View Slide

  82. 67
    ➡ You might want to filter stop-words first.
    ➡ You might want to make sure negatives are
    handled property "not great" => negative.
    ➡ Bonus points if you can spot sarcasm.

    View Slide

  83. ➡ Collaborative filtering (mahout):
    ➡ If user likes product A, B and C, what is the
    chance that they like product D?
    68

    View Slide

  84. 69
    Mess up your (training) data, and nothing can save you
    (except a training set reboot)

    View Slide

  85. ➡ Binomial probability
    70

    View Slide

  86. 71
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's

    View Slide

  87. 71
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's
    1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832
    83% on getting selected at least once!

    View Slide

  88. Ockham's Razor
    72

    View Slide

  89. 73
    Among competing hypotheses, the one with
    the fewest assumptions should be selected.

    View Slide

  90. 74
    82
    Everything should be made as
    simple as possible, but no simpler.

    View Slide

  91. YAGNI
    75

    View Slide

  92. 76
    Actually,
    ➡ The principle of plurality
    Plurality should not be posited with
    necessity.
    ➡ The principle of parsimony
    It is pointless to do more with what is
    done with less.

    View Slide

  93. ➡ Every element you add needs: design,
    development, maintenance, connectivity,
    support, etc etc.
    ➡ When "adding" elements, you are not
    adding, you are multiplying!
    77

    View Slide

  94. 78
    Food for thought:
    Would Ockham accept a
    Service Oriented
    Architecture?

    View Slide

  95. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 79

    View Slide

  96. 80
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl

    View Slide