Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paradoxes and theorems every developer should know

Paradoxes and theorems every developer should know

Joshua Thijssen

June 21, 2016
Tweet

More Decks by Joshua Thijssen

Other Decks in Technology

Transcript

  1. 1
    Joshua Thijssen
    jaytaph
    namespace

    View full-size slide

  2. 2
    Joshua Thijssen
    Consultant and trainer @ NoxLogic
    Founder of TechAnalyze.io
    Symfony Rainbow Books author
    Mastering the SPL author
    Blog: http://adayinthelifeof.nl
    Email: [email protected]
    Twitter: @jaytaph Tech nalyze
    WWW.TECHANALYZE.IO

    View full-size slide

  3. 3
    https://dutchtechrecruitment.nl/
    Text

    View full-size slide

  4. Disclaimer:
    I'm not a (mad)
    scientist nor a
    mathematician.
    4

    View full-size slide

  5. German Tank
    Problem
    5

    View full-size slide

  6. 8
    k = number of elements
    m = largest number

    View full-size slide

  7. 72 + (72 / 4) - 1 = 89
    9

    View full-size slide

  8. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem

    View full-size slide

  9. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122

    View full-size slide

  10. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271

    View full-size slide

  11. 10
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271
    342

    View full-size slide

  12. 11
    ➡ Data leakage.

    View full-size slide

  13. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc

    View full-size slide

  14. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.

    View full-size slide

  15. 11
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.
    ➡ Calculate approximations of datasets with
    (incomplete) information.

    View full-size slide

  16. ➡ Avoid (semi) sequential data to be leaked.
    ➡ Adding randomness and offsets will NOT
    solve the issue.
    ➡ Use UUIDs
    (better: timebased short IDs, you don't need UUIDs)
    13

    View full-size slide

  17. 14
    Collecting (big) data is easy
    Analyzing big data is the hard part.

    View full-size slide

  18. Confirmation Bias
    15

    View full-size slide

  19. 2 4 6
    16
    Z={…,−2,−1,0,1,2,…}

    View full-size slide

  20. 18
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View full-size slide

  21. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View full-size slide

  22. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View full-size slide

  23. 20
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View full-size slide

  24. Cognitive Adaption
    for social exchange
    21

    View full-size slide

  25. hint:
    Try and place your "technical
    problem" in a more social context.
    22

    View full-size slide

  26. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View full-size slide

  27. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View full-size slide

  28. 24
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face is blue.

    View full-size slide

  29. 26
    ➡ Step 1: Write code
    ➡ Step 2: Write tests
    ➡ Step 3: Profit

    View full-size slide

  30. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    27
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View full-size slide

  31. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    27
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
    testIs1996ALeapYeap();
    testIs2000ALeapYeap();
    testIs2004ALeapYeap();
    testIs2008ALeapYeap();
    testIs2012ALeapYeap();
    testIs1997NotALeapYear();
    testIs1998NotALeapYear();
    testIs2001NotALeapYear();
    testIs2013NotALeapYear();

    View full-size slide

  32. public function isLeapYeap($year) {
    return ($year % 4 == 0);
    }
    28
    https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing

    View full-size slide

  33. 29
    ➡ Tests where written based on actual code.
    ➡ Tests where written to CONFIRM actual
    code, not to DISPROVE actual code!

    View full-size slide

  34. 31
    ➡ Step 1: Write tests
    ➡ Step 2: Write code
    ➡ Step 3: Profit, as less prone to confirmation
    bias (as there is nothing to bias!)

    View full-size slide

  35. Birthday paradox
    32

    View full-size slide

  36. Question:
    33
    > 50% chance
    4 march
    18 september
    5 december
    25 juli
    2 februari
    9 october

    View full-size slide

  37. 366 persons = 100%
    35

    View full-size slide

  38. Collisions occur more
    often than you realize
    36

    View full-size slide

  39. Hash collisions
    37

    View full-size slide

  40. 16 bits means
    300 values before >50%
    collision probability
    38

    View full-size slide

  41. Watch out for:
    39
    ➡ Too small hashes.
    ➡ Unique data.
    ➡ Your data might be less "protected" as
    you might think.

    View full-size slide

  42. Heisenberg
    uncertainty
    principle
    40

    View full-size slide

  43. It's not about
    star trek
    (heisenberg compensators)
    41

    View full-size slide

  44. nor crystal meth
    42

    View full-size slide

  45. 43
    x position
    p momentum (mass x velocity)
    ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

    View full-size slide

  46. The more precise you
    know one property, the
    less you know the other.
    44

    View full-size slide

  47. This is NOT about
    observing!
    45

    View full-size slide

  48. Observer effect
    46
    heisenbug

    View full-size slide

  49. It's about trade-offs
    47

    View full-size slide

  50. Benford's law
    48

    View full-size slide

  51. Numbers beginning with 1 are
    more common than numbers
    beginning with 9.
    49

    View full-size slide

  52. Default behavior for
    natural numbers.
    50

    View full-size slide

  53. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52

    View full-size slide

  54. find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52
    1073 1
    886 2
    636 3
    372 4
    352 5
    350 6
    307 7
    247 8
    222 9

    View full-size slide

  55. Bayesian filtering
    54

    View full-size slide

  56. What's the probability of an
    event, based on conditions that
    might be related to the event.
    55

    View full-size slide

  57. What is the chance that a
    message is spam when it
    contains certain words?
    56

    View full-size slide

  58. 57
    P(A|B)
    P(A)
    P(B)
    P(B|A)
    Probability event A, if event B (conditional)
    Probability event A
    Probability event B
    Probability event B, if event A

    View full-size slide

  59. 58
    ➡ Figure out the probability a {mail, tweet,
    comment, review} is {spam, negative} etc.

    View full-size slide

  60. ➡ 10 out of 50 comments are "negative".
    ➡ 25 out of 50 comments uses the word
    "horrible".
    ➡ 8 comments with the word "horrible" are
    marked as "negative".
    59

    View full-size slide

  61. 60
    negative
    "horrible"
    10 comments
    25 comments
    8 comments

    View full-size slide

  62. 62
    ➡ More words?
    ➡ Complex algorithm,
    ➡ but, we can assume that words are not
    independent from eachother
    ➡ Naive Bayes approach

    View full-size slide

  63. 64
    We must know
    beforehand which
    comments are
    negative?

    View full-size slide

  64. TRAINING SET
    65

    View full-size slide

  65. 66
    "Your product is horrible and does
    not work properly. Also, you suck."
    "I had a horrible experience with
    another product. But yours really
    worked well. Thank you!"
    Negative:
    Positive:

    View full-size slide

  66. 67
    ➡ You might want to filter stop-words first.
    ➡ You might want to make sure negatives are
    handled property "not great" => negative.
    ➡ Bonus points if you can spot sarcasm.

    View full-size slide

  67. ➡ Collaborative filtering (mahout):
    ➡ If user likes product A, B and C, what is the
    chance that they like product D?
    68

    View full-size slide

  68. 69
    Mess up your (training) data, and nothing can save you
    (except a training set reboot)

    View full-size slide

  69. 70
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's
    Binomial probability

    View full-size slide

  70. 70
    ➡ 30% change of acceptance for CFP
    ➡ 5 CFP's
    1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832
    83% on getting selected at least once!
    Binomial probability

    View full-size slide

  71. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 71

    View full-size slide

  72. 72
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl

    View full-size slide