Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Paradoxes and theorems every developer should know

Joshua Thijssen
July 01, 2017
170

Paradoxes and theorems every developer should know

Joshua Thijssen

July 01, 2017
Tweet

Transcript

  1. @jaytaph 1
    Joshua Thijssen
    jaytaph
    Paradoxes and theorems
    every developer should know
    namespace

    View Slide

  2. @jaytaph
    Disclaimer:
    I'm not a (mad)
    scientist nor a
    mathematician.
    2

    View Slide

  3. @jaytaph
    German Tank
    Problem
    3

    View Slide

  4. @jaytaph 4
    15

    View Slide

  5. @jaytaph 5

    View Slide

  6. @jaytaph 5
    53
    72
    8
    15

    View Slide

  7. @jaytaph 6
    k = number of elements
    m = largest number

    View Slide

  8. @jaytaph
    72 + (72 / 4) - 1 = 89
    7

    View Slide

  9. @jaytaph 8
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem

    View Slide

  10. @jaytaph 8
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122

    View Slide

  11. @jaytaph 8
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271

    View Slide

  12. @jaytaph 8
    Intelligence Statistics Actual
    June 1940 1000 169
    June 1941 1550 244
    August
    1942
    1550 327
    https://en.wikipedia.org/wiki/German_tank_problem
    122
    271
    342

    View Slide

  13. @jaytaph 9

    View Slide

  14. @jaytaph 9
    ➡ Data leakage.

    View Slide

  15. @jaytaph 9
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc

    View Slide

  16. @jaytaph 9
    ➡ Data leakage.
    ➡ User-id's, invoice-id's, etc
    ➡ Used to approximate the number of
    iPhones sold in 2008.

    View Slide

  17. @jaytaph 10
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Jan 2476 2303
    Feb 10718 14891
    Mar 19413 27858
    Apr 28833 41458
    May 38644 55429
    Jun 48633 55429
    Jul 102606 59027 84961
    Aug 109331 69715 100308
    Sep 116388 80684 116020
    Oct 123721 91935 132004
    Nov 131241 103455 148341
    Dec 139236 115276 164976

    View Slide

  18. @jaytaph 11
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Jan 2476 2303
    Feb 10718 14891
    Mar 19413 27858
    Apr 28833 41458
    May 38644 55429
    Jun 48633 55429
    Jul 102606 59027 84961
    Aug 109331 69715 100308
    Sep 116388 80684 116020
    Oct 123721 91935 132004
    Nov 131241 103455 148341
    Dec 139236 115276 164976
    Estimated subscriptions
    Estimated subscriptions
    Estimated subscriptions
    Estimated subscriptions
    Jan
    Feb 8242 12588
    Mar 8695 12967
    Apr 9420 13600
    May 9811 13971
    Jun 9989 14525
    Jul 10394 15007
    Aug 6725 10688 15347
    Sep 7057 10969 15712
    Oct 7333 11251 15984
    Nov 7520 11520 16337
    Dec 7995 11821 16635

    View Slide

  19. @jaytaph 12
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Monthly Invoice IDs
    Jan 2476 2303
    Feb 10718 14891
    Mar 19413 27858
    Apr 28833 41458
    May 38644 55429
    Jun 48633 55429
    Jul 102606 59027 84961
    Aug 109331 69715 100308
    Sep 116388 80684 116020
    Oct 123721 91935 132004
    Nov 131241 103455 148341
    Dec 139236 115276 164976
    Estimated growth / size
    Estimated growth / size
    Estimated growth / size
    Estimated growth / size
    Jan
    Feb
    Mar 105% 103%
    Apr 108% 105%
    May 104% 103%
    Jun 102% 104%
    Jul 104% 103%
    Aug 103% 102%
    Sep 105% 103% 102%
    Oct 104% 103% 102%
    Nov 103% 102% 102%
    Dec 106% 103% 102%

    View Slide

  20. @jaytaph
    ➡ Avoid (semi) sequential data to be leaked.
    ➡ Adding randomness and offsets will NOT
    solve the issue.
    ➡ Use UUIDs
    (better: timebased short IDs, you don't need UUIDs)
    13

    View Slide

  21. @jaytaph
    Confirmation Bias
    14

    View Slide

  22. @jaytaph 15
    Hypothesis....

    View Slide

  23. @jaytaph 16
    Evidence!

    View Slide

  24. @jaytaph 17
    Hypothesis confirmed!

    View Slide

  25. @jaytaph 18

    View Slide

  26. @jaytaph
    2 4 6
    19
    Z={…,−2,−1,0,1,2,…}

    View Slide

  27. @jaytaph
    21%
    20

    View Slide

  28. @jaytaph
    Don't try and confirm what you know.
    Try and disprove instead.
    21

    View Slide

  29. @jaytaph
    Confirmation bias is everywhere!
    22

    View Slide

  30. @jaytaph 23
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face must be blue.

    View Slide

  31. @jaytaph
    < 10%
    24

    View Slide

  32. @jaytaph 25
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  33. @jaytaph 25
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  34. @jaytaph 25
    coke beer 35 17
    If you drink beer
    then you must be 18 yrs or older.

    View Slide

  35. @jaytaph
    Cognitive Adaption
    for social exchange
    26

    View Slide

  36. @jaytaph
    hint:
    Try and place your "technical
    problem" in a more social context.
    27

    View Slide

  37. @jaytaph 28
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face must be blue.

    View Slide

  38. @jaytaph 28
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face must be blue.

    View Slide

  39. @jaytaph 28
    5 8 ? ?
    If a card shows an even number on one face,
    then its opposite face must be blue.

    View Slide

  40. @jaytaph 29
    TDD

    View Slide

  41. @jaytaph
    Birthday paradox
    30

    View Slide

  42. @jaytaph
    Question:
    31
    > 50% chance
    4 march
    18 september
    5 december
    25 juli
    2 februari
    9 october

    View Slide

  43. @jaytaph
    23 people
    32

    View Slide

  44. @jaytaph
    366* persons = 100%
    33

    View Slide

  45. @jaytaph
    Collisions occur more
    often than you realize
    34

    View Slide

  46. @jaytaph
    Hash collisions
    35

    View Slide

  47. @jaytaph
    16 bit value
    300 elements
    36

    View Slide

  48. @jaytaph
    random_int(1,100000)
    how many attempts before
    50% collision chance?
    37

    View Slide

  49. @jaytaph
    random_int(1,100000)
    117 elements
    38

    View Slide

  50. @jaytaph
    Watch out for:
    39
    ➡ Too small hashes.
    ➡ Unique data.
    ➡ Your data might be less "protected" as
    you might think.

    View Slide

  51. @jaytaph
    Heisenberg
    uncertainty
    principle
    40

    View Slide

  52. @jaytaph 41

    View Slide

  53. @jaytaph 42

    View Slide

  54. @jaytaph 43
    x position
    p momentum (mass x velocity)
    ħ 0.0000000000000000000000000000000001054571800 (1.054571800E-34)

    View Slide

  55. @jaytaph
    The more precise you
    know one property, the
    less you know the other.
    44

    View Slide

  56. @jaytaph
    It's about trade-offs
    45

    View Slide

  57. @jaytaph
    This is NOT about
    observing!
    46

    View Slide

  58. @jaytaph
    Observer effect
    47
    heisenbug

    View Slide

  59. @jaytaph
    Benford's law
    48

    View Slide

  60. @jaytaph
    Numbers beginning with 1 are
    more common than numbers
    beginning with 9.
    49

    View Slide

  61. @jaytaph
    Default behavior for
    natural numbers.
    50

    View Slide

  62. @jaytaph 51

    View Slide

  63. @jaytaph
    find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52

    View Slide

  64. @jaytaph
    find . -name \*.php -exec wc -l {} \; | sort | cut -b 1 | uniq -c
    52
    1073 1
    886 2
    636 3
    372 4
    352 5
    350 6
    307 7
    247 8
    222 9

    View Slide

  65. @jaytaph 53

    View Slide

  66. @jaytaph
    Bayesian filtering
    54

    View Slide

  67. @jaytaph
    What's the probability of an
    event, based on conditions that
    might be related to the event.
    55

    View Slide

  68. @jaytaph
    What is the chance that a
    message is spam when it
    contains certain words?
    56

    View Slide

  69. @jaytaph 57
    P(A|B)
    P(A)
    P(B)
    P(B|A)
    Probability event A, if event B (conditional)
    Probability event A
    Probability event B
    Probability event B, if event A

    View Slide

  70. @jaytaph 58
    ➡ Figure out the probability a {mail, tweet,
    comment, review} is {spam, negative} etc.

    View Slide

  71. @jaytaph
    ➡ 10 out of 50 comments are "negative".
    ➡ 25 out of 50 comments uses the word
    "horrible".
    ➡ 8 comments with the word "horrible" are
    marked as "negative".
    59

    View Slide

  72. @jaytaph 60
    10 comments
    25 comments
    8 comments
    negative
    "horrible"

    View Slide

  73. @jaytaph 61

    View Slide

  74. @jaytaph 62

    View Slide

  75. @jaytaph 63
    ➡ You might want to filter stop-words first.
    ➡ You might want to make sure negatives are
    handled property "not great" => negative.
    ➡ Bonus points if you can spot sarcasm.

    View Slide

  76. @jaytaph
    ➡ Collaborative filtering (mahout):
    ➡ If user likes product A, B and C, what is the
    chance that they like product D?
    64

    View Slide

  77. @jaytaph 65
    Mess up your (training) data, and nothing can save you
    (except a training set reboot)

    View Slide

  78. @jaytaph 66

    View Slide

  79. @jaytaph
    67
    Find me on twitter: @jaytaph
    Find me for development and training:
    www.noxlogic.nl / www.techademy.nl
    Find me on email: [email protected]

    View Slide