Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paradoxes and theorems every developer should know

Paradoxes and theorems every developer should know

Joshua Thijssen

June 21, 2016
Tweet

More Decks by Joshua Thijssen

Other Decks in Technology

Transcript

  1. 1
    Joshua Thijssen
    jaytaph
    namespace

    View Slide

  2. 2
    Joshua Thijssen
    Consultant and trainer @ NoxLogic
    Founder of TechAnalyze.io
    Symfony Rainbow Books author
    Mastering the SPL author
    Blog: http://adayinthelifeof.nl
    Email: [email protected]
    Twitter: @jaytaph Tech nalyze
    WWW.TECHANALYZE.IO

    View Slide

  3. 3
    https://dutchtechrecruitment.nl/
    Text

    View Slide

  4. Disclaimer:
    I'm not a (mad)
    scientist nor a
    mathematician.
    4

    View Slide

  5. German Tank
    Problem
    5

    View Slide

  6. 6

    View Slide

  7. 6
    15

    View Slide

  8. 7

    View Slide

  9. 7
    53
    72
    8
    15

    View Slide

  10. 8
    k = number of elements
    m = largest number

    View Slide

  11. 72 + (72 / 4) - 1 = 89
    9

    View Slide

  12. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem

    View Slide

  13. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122

    View Slide

  14. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271

    View Slide

  15. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271
    342

    View Slide

  16. 11

    View Slide

  17. 11
    ➡ Data leakage.

    View Slide

  18. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc

    View Slide

  19. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.

    View Slide

  20. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.
    ➡ Calculate approximations of datasets with
    (incomplete) information.

    View Slide

  21. 12

    View Slide

  22. ➡ Avoid (semi) sequential data to be leaked.
    ➡ Adding randomness and offsets will NOT
    solve the issue.
    ➡ Use UUIDs
    (better: timebased short IDs, you don't need UUIDs)
    13

    View Slide

  23. 14
    Collecting (big) data is easy
    Analyzing big data is the hard part.

    View Slide

  24. Confirmation Bias
    15

    View Slide

  25. 2 4 6
    16
    Z={…,−2,−1,0,1,2,…}

    View Slide

  26. 21%
    17

    View Slide

  27. 18
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  28. < 10%
    19

    View Slide

  29. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  30. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  31. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  32. Cognitive Adaption
    for social exchange
    21

    View Slide

  33. hint:
    Try and place your "technical
    problem" in a more social context.
    22

    View Slide

  34. BDD
    23

    View Slide

  35. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  36. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  37. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View Slide

  38. TESTING
    25

    View Slide

  39. 26
    ➡ Step 1: Write code
    ➡ Step 2: Write tests
    ➡ Step 3: Profit

    View Slide

  40. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    27
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View Slide

  41. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    27
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View Slide

  42. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    28
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

    View Slide

  43. 29
    ➡ Tests where written based on actual code.
    ➡ Tests where written to CONFIRM actual
    code, not to DISPROVE actual code!

    View Slide

  44. 30
    TDD

    View Slide

  45. 31
    ➡ Step 1: Write tests
    ➡ Step 2: Write code
    ➡ Step 3: Profit, as less prone to confirmation
    bias (as there is nothing to bias!)

    View Slide

  46. Birthday paradox
    32

    View Slide

  47. Question:
    33
    > 50% chance
    4 march
    18 september
    5 december
    25 juli
    2 februari
    9 october

    View Slide

  48. 23 people
    34

    View Slide

  49. 366 persons = 100%
    35

    View Slide

  50. Collisions occur more
    often than you realize
    36

    View Slide

  51. Hash collisions
    37

    View Slide

  52. 16 bits means
    300 values before >50%
    collision probability
    38

    View Slide

  53. Watch out for:
    39
    ➡ Too small hashes.
    ➡ Unique data.
    ➡ Your data might be less "protected" as
    you might think.

    View Slide

  54. Heisenberg
    uncertainty
    principle
    40

    View Slide

  55. It's not about
    star trek
    (heisenberg compensators)
    41

    View Slide

  56. nor crystal meth
    42

    View Slide

  57. 43
    x position
    p momentum (mass x velocity)
    ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

    View Slide

  58. The more precise you
    know one property, the
    less you know the other.
    44

    View Slide

  59. This is NOT about
    observing!
    45

    View Slide

  60. Observer effect
    46
    heisenbug

    View Slide

  61. It's about trade-offs
    47

    View Slide

  62. Benford's law
    48

    View Slide

  63. Numbers beginning with 1 are
    more common than numbers
    beginning with 9.
    49

    View Slide

  64. Default behavior for
    natural numbers.
    50

    View Slide

  65. 51

    View Slide

  66. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52

    View Slide

  67. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52
    1073 1
    886 2
    636 3
    372 4
    352 5
    350 6
    307 7
    247 8
    222 9

    View Slide

  68. 53

    View Slide

  69. Bayesian filtering
    54

    View Slide

  70. What's the probability of an
    event, based on conditions that
    might be related to the event.
    55

    View Slide

  71. What is the chance that a
    message is spam when it
    contains certain words?
    56

    View Slide

  72. 57
    P(A|B)
    P(A)
    P(B)
    P(B|A)
    Probability event A, if event B (conditional)
    Probability event A
    Probability event B
    Probability event B, if event A

    View Slide

  73. 58
    ➡ Figure out the probability a {mail, tweet,
    comment, review} is {spam, negative} etc.

    View Slide

  74. ➡ 10 out of 50 comments are "negative".
    ➡ 25 out of 50 comments uses the word
    "horrible".
    ➡ 8 comments with the word "horrible" are
    marked as "negative".
    59

    View Slide

  75. 60
    negative
    "horrible"
    10 comments
    25 comments
    8 comments

    View Slide

  76. 61

    View Slide

  77. 62
    ➡ More words?
    ➡ Complex algorithm,
    ➡ but, we can assume that words are not
    independent from eachother
    ➡ Naive Bayes approach

    View Slide

  78. 63

    View Slide

  79. 64
    We must know
    beforehand which
    comments are
    negative?

    View Slide

  80. TRAINING SET
    65

    View Slide

  81. 66
    "Your product is horrible and does
    not work properly. Also, you suck."
    "I had a horrible experience with
    another product. But yours really
    worked well. Thank you!"
    Negative:
    Positive:

    View Slide

  82. 67
    ➡ You might want to filter stop-words first.
    ➡ You might want to make sure negatives are
    handled property "not great" => negative.
    ➡ Bonus points if you can spot sarcasm.

    View Slide

  83. ➡ Collaborative filtering (mahout):
    ➡ If user likes product A, B and C, what is the
    chance that they like product D?
    68

    View Slide

  84. 69
    Mess up your (training) data, and nothing can save you
    (except a training set reboot)

    View Slide

  85. 70
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's
    Binomial probability

    View Slide

  86. 70
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's
    1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832
    83% on getting selected at least once!
    Binomial probability

    View Slide

  87. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 71

    View Slide

  88. 72
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl

    View Slide