Paradoxes and theorems every developer should know

1761ecd7fe763583553dde43e62c47bd?s=47 Joshua Thijssen
July 01, 2017
140

Paradoxes and theorems every developer should know

1761ecd7fe763583553dde43e62c47bd?s=128

Joshua Thijssen

July 01, 2017
Tweet

Transcript

  1. @jaytaph 1 Joshua Thijssen jaytaph Paradoxes and theorems every developer

    should know <?php namespace
  2. @jaytaph Disclaimer: I'm not a (mad) scientist nor a mathematician.

    2
  3. @jaytaph German Tank Problem 3

  4. @jaytaph 4 15

  5. @jaytaph 5

  6. @jaytaph 5 53 72 8 15

  7. @jaytaph 6 k = number of elements m = largest

    number
  8. @jaytaph 72 + (72 / 4) - 1 = 89

    7
  9. @jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June

    1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem
  10. @jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June

    1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122
  11. @jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June

    1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271
  12. @jaytaph 8 Intelligence Statistics Actual June 1940 1000 169 June

    1941 1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342
  13. @jaytaph 9

  14. @jaytaph 9 ➡ Data leakage.

  15. @jaytaph 9 ➡ Data leakage. ➡ User-id's, invoice-id's, etc

  16. @jaytaph 9 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡

    Used to approximate the number of iPhones sold in 2008.
  17. @jaytaph 10 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice

    IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976
  18. @jaytaph 11 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice

    IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976 Estimated subscriptions Estimated subscriptions Estimated subscriptions Estimated subscriptions Jan Feb 8242 12588 Mar 8695 12967 Apr 9420 13600 May 9811 13971 Jun 9989 14525 Jul 10394 15007 Aug 6725 10688 15347 Sep 7057 10969 15712 Oct 7333 11251 15984 Nov 7520 11520 16337 Dec 7995 11821 16635
  19. @jaytaph 12 Monthly Invoice IDs Monthly Invoice IDs Monthly Invoice

    IDs Monthly Invoice IDs Jan 2476 2303 Feb 10718 14891 Mar 19413 27858 Apr 28833 41458 May 38644 55429 Jun 48633 55429 Jul 102606 59027 84961 Aug 109331 69715 100308 Sep 116388 80684 116020 Oct 123721 91935 132004 Nov 131241 103455 148341 Dec 139236 115276 164976 Estimated growth / size Estimated growth / size Estimated growth / size Estimated growth / size Jan Feb Mar 105% 103% Apr 108% 105% May 104% 103% Jun 102% 104% Jul 104% 103% Aug 103% 102% Sep 105% 103% 102% Oct 104% 103% 102% Nov 103% 102% 102% Dec 106% 103% 102%
  20. @jaytaph ➡ Avoid (semi) sequential data to be leaked. ➡

    Adding randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13
  21. @jaytaph Confirmation Bias 14

  22. @jaytaph 15 Hypothesis....

  23. @jaytaph 16 Evidence!

  24. @jaytaph 17 Hypothesis confirmed!

  25. @jaytaph 18

  26. @jaytaph 2 4 6 19 Z={…,−2,−1,0,1,2,…}

  27. @jaytaph 21% 20

  28. @jaytaph Don't try and confirm what you know. Try and

    disprove instead. 21
  29. @jaytaph Confirmation bias is everywhere! 22

  30. @jaytaph 23 5 8 ? ? If a card shows

    an even number on one face, then its opposite face must be blue.
  31. @jaytaph < 10% 24

  32. @jaytaph 25 coke beer 35 17 If you drink beer

    then you must be 18 yrs or older.
  33. @jaytaph 25 coke beer 35 17 If you drink beer

    then you must be 18 yrs or older.
  34. @jaytaph 25 coke beer 35 17 If you drink beer

    then you must be 18 yrs or older.
  35. @jaytaph Cognitive Adaption for social exchange 26

  36. @jaytaph hint: Try and place your "technical problem" in a

    more social context. 27
  37. @jaytaph 28 5 8 ? ? If a card shows

    an even number on one face, then its opposite face must be blue.
  38. @jaytaph 28 5 8 ? ? If a card shows

    an even number on one face, then its opposite face must be blue.
  39. @jaytaph 28 5 8 ? ? If a card shows

    an even number on one face, then its opposite face must be blue.
  40. @jaytaph 29 TDD

  41. @jaytaph Birthday paradox 30

  42. @jaytaph Question: 31 > 50% chance 4 march 18 september

    5 december 25 juli 2 februari 9 october
  43. @jaytaph 23 people 32

  44. @jaytaph 366* persons = 100% 33

  45. @jaytaph Collisions occur more often than you realize 34

  46. @jaytaph Hash collisions 35

  47. @jaytaph 16 bit value 300 elements 36

  48. @jaytaph random_int(1,100000) how many attempts before 50% collision chance? 37

  49. @jaytaph random_int(1,100000) 117 elements 38

  50. @jaytaph Watch out for: 39 ➡ Too small hashes. ➡

    Unique data. ➡ Your data might be less "protected" as you might think.
  51. @jaytaph Heisenberg uncertainty principle 40

  52. @jaytaph 41

  53. @jaytaph 42

  54. @jaytaph 43 x position p momentum (mass x velocity) ħ

    0.0000000000000000000000000000000001054571800 (1.054571800E-34)
  55. @jaytaph The more precise you know one property, the less

    you know the other. 44
  56. @jaytaph It's about trade-offs 45

  57. @jaytaph This is NOT about observing! 46

  58. @jaytaph Observer effect 47 heisenbug

  59. @jaytaph Benford's law 48

  60. @jaytaph Numbers beginning with 1 are more common than numbers

    beginning with 9. 49
  61. @jaytaph Default behavior for natural numbers. 50

  62. @jaytaph 51

  63. @jaytaph find . -name \*.php -exec wc -l {} \;

    | sort | cut -b 1 | uniq -c 52
  64. @jaytaph find . -name \*.php -exec wc -l {} \;

    | sort | cut -b 1 | uniq -c 52 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9
  65. @jaytaph 53

  66. @jaytaph Bayesian filtering 54

  67. @jaytaph What's the probability of an event, based on conditions

    that might be related to the event. 55
  68. @jaytaph What is the chance that a message is spam

    when it contains certain words? 56
  69. @jaytaph 57 P(A|B) P(A) P(B) P(B|A) Probability event A, if

    event B (conditional) Probability event A Probability event B Probability event B, if event A
  70. @jaytaph 58 ➡ Figure out the probability a {mail, tweet,

    comment, review} is {spam, negative} etc.
  71. @jaytaph ➡ 10 out of 50 comments are "negative". ➡

    25 out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59
  72. @jaytaph 60 10 comments 25 comments 8 comments negative "horrible"

  73. @jaytaph 61

  74. @jaytaph 62

  75. @jaytaph 63 ➡ You might want to filter stop-words first.

    ➡ You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.
  76. @jaytaph ➡ Collaborative filtering (mahout): ➡ If user likes product

    A, B and C, what is the chance that they like product D? 64
  77. @jaytaph 65 Mess up your (training) data, and nothing can

    save you (except a training set reboot)
  78. @jaytaph 66

  79. @jaytaph 67 Find me on twitter: @jaytaph Find me for

    development and training: www.noxlogic.nl / www.techademy.nl Find me on email: jthijssen@noxlogic.nl