Paradoxes and theorems every developer should know

Paradoxes and theorems every developer should know

1761ecd7fe763583553dde43e62c47bd?s=128

Joshua Thijssen

June 21, 2016
Tweet

Transcript

  1. 1 Joshua Thijssen jaytaph <?php namespace

  2. 2 Joshua Thijssen Consultant and trainer @ NoxLogic Founder of

    TechAnalyze.io Symfony Rainbow Books author Mastering the SPL author Blog: http://adayinthelifeof.nl Email: jthijssen@noxlogic.nl Twitter: @jaytaph Tech nalyze WWW.TECHANALYZE.IO
  3. 3 https://dutchtechrecruitment.nl/ Text

  4. Disclaimer: I'm not a (mad) scientist nor a mathematician. 4

  5. German Tank Problem 5

  6. 6

  7. 6 15

  8. 7

  9. 7 53 72 8 15

  10. 8 k = number of elements m = largest number

  11. 72 + (72 / 4) - 1 = 89 9

  12. 10 Intelligence Statistics Actual June 1940 1000 169 June 1941

    1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem
  13. 10 Intelligence Statistics Actual June 1940 1000 169 June 1941

    1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122
  14. 10 Intelligence Statistics Actual June 1940 1000 169 June 1941

    1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271
  15. 10 Intelligence Statistics Actual June 1940 1000 169 June 1941

    1550 244 August 1942 1550 327 https://en.wikipedia.org/wiki/German_tank_problem 122 271 342
  16. 11

  17. 11 ➡ Data leakage.

  18. 11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc

  19. 11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used

    to approximate the number of iPhones sold in 2008.
  20. 11 ➡ Data leakage. ➡ User-id's, invoice-id's, etc ➡ Used

    to approximate the number of iPhones sold in 2008. ➡ Calculate approximations of datasets with (incomplete) information.
  21. 12

  22. ➡ Avoid (semi) sequential data to be leaked. ➡ Adding

    randomness and offsets will NOT solve the issue. ➡ Use UUIDs (better: timebased short IDs, you don't need UUIDs) 13
  23. 14 Collecting (big) data is easy Analyzing big data is

    the hard part.
  24. Confirmation Bias 15

  25. 2 4 6 16 Z={…,−2,−1,0,1,2,…}

  26. 21% 17

  27. 18 5 8 ? ? If a card shows an

    even number on one face, then its opposite face is blue.
  28. < 10% 19

  29. 20 coke beer 35 17 If you drink beer then

    you must be 18 yrs or older.
  30. 20 coke beer 35 17 If you drink beer then

    you must be 18 yrs or older.
  31. 20 coke beer 35 17 If you drink beer then

    you must be 18 yrs or older.
  32. Cognitive Adaption for social exchange 21

  33. hint: Try and place your "technical problem" in a more

    social context. 22
  34. BDD 23

  35. 24 5 8 ? ? If a card shows an

    even number on one face, then its opposite face is blue.
  36. 24 5 8 ? ? If a card shows an

    even number on one face, then its opposite face is blue.
  37. 24 5 8 ? ? If a card shows an

    even number on one face, then its opposite face is blue.
  38. TESTING 25

  39. 26 ➡ Step 1: Write code ➡ Step 2: Write

    tests ➡ Step 3: Profit
  40. public function isLeapYeap($year) { return ($year % 4 == 0);

    } 27 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();
  41. public function isLeapYeap($year) { return ($year % 4 == 0);

    } 27 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing testIs1996ALeapYeap(); testIs2000ALeapYeap(); testIs2004ALeapYeap(); testIs2008ALeapYeap(); testIs2012ALeapYeap(); testIs1997NotALeapYear(); testIs1998NotALeapYear(); testIs2001NotALeapYear(); testIs2013NotALeapYear();
  42. public function isLeapYeap($year) { return ($year % 4 == 0);

    } 28 https://www.sundoginteractive.com/blog/confirmation-bias-in-unit-testing
  43. 29 ➡ Tests where written based on actual code. ➡

    Tests where written to CONFIRM actual code, not to DISPROVE actual code!
  44. 30 TDD

  45. 31 ➡ Step 1: Write tests ➡ Step 2: Write

    code ➡ Step 3: Profit, as less prone to confirmation bias (as there is nothing to bias!)
  46. Birthday paradox 32

  47. Question: 33 > 50% chance 4 march 18 september 5

    december 25 juli 2 februari 9 october
  48. 23 people 34

  49. 366 persons = 100% 35

  50. Collisions occur more often than you realize 36

  51. Hash collisions 37

  52. 16 bits means 300 values before >50% collision probability 38

  53. Watch out for: 39 ➡ Too small hashes. ➡ Unique

    data. ➡ Your data might be less "protected" as you might think.
  54. Heisenberg uncertainty principle 40

  55. It's not about star trek (heisenberg compensators) 41

  56. nor crystal meth 42

  57. 43 x position p momentum (mass x velocity) ħ 0.0000000000000000000000000000000001054571800

    (1.054571800E-34)
  58. The more precise you know one property, the less you

    know the other. 44
  59. This is NOT about observing! 45

  60. Observer effect 46 heisenbug

  61. It's about trade-offs 47

  62. Benford's law 48

  63. Numbers beginning with 1 are more common than numbers beginning

    with 9. 49
  64. Default behavior for natural numbers. 50

  65. 51

  66. find . -name \*.php -exec wc -l {} \; |

    sort | cut -b 1 | uniq -c 52
  67. find . -name \*.php -exec wc -l {} \; |

    sort | cut -b 1 | uniq -c 52 1073 1 886 2 636 3 372 4 352 5 350 6 307 7 247 8 222 9
  68. 53

  69. Bayesian filtering 54

  70. What's the probability of an event, based on conditions that

    might be related to the event. 55
  71. What is the chance that a message is spam when

    it contains certain words? 56
  72. 57 P(A|B) P(A) P(B) P(B|A) Probability event A, if event

    B (conditional) Probability event A Probability event B Probability event B, if event A
  73. 58 ➡ Figure out the probability a {mail, tweet, comment,

    review} is {spam, negative} etc.
  74. ➡ 10 out of 50 comments are "negative". ➡ 25

    out of 50 comments uses the word "horrible". ➡ 8 comments with the word "horrible" are marked as "negative". 59
  75. 60 negative "horrible" 10 comments 25 comments 8 comments

  76. 61

  77. 62 ➡ More words? ➡ Complex algorithm, ➡ but, we

    can assume that words are not independent from eachother ➡ Naive Bayes approach
  78. 63

  79. 64 We must know beforehand which comments are negative?

  80. TRAINING SET 65

  81. 66 "Your product is horrible and does not work properly.

    Also, you suck." "I had a horrible experience with another product. But yours really worked well. Thank you!" Negative: Positive:
  82. 67 ➡ You might want to filter stop-words first. ➡

    You might want to make sure negatives are handled property "not great" => negative. ➡ Bonus points if you can spot sarcasm.
  83. ➡ Collaborative filtering (mahout): ➡ If user likes product A,

    B and C, what is the chance that they like product D? 68
  84. 69 Mess up your (training) data, and nothing can save

    you (except a training set reboot)
  85. 70 ➡ 30% change of acceptance for CFP ➡ 5

    CFP's Binomial probability
  86. 70 ➡ 30% change of acceptance for CFP ➡ 5

    CFP's 1 - (0.7 * 0.7 * 0.7 * 0.7 * 0.7) = 1 - 0.168 = 0.832 83% on getting selected at least once! Binomial probability
  87. http://farm1.static.flickr.com/73/163450213_18478d3aa6_d.jpg 71

  88. 72 Find me on twitter: @jaytaph Find me for development

    and training: www.noxlogic.nl / www.techademy.nl Find me on email: jthijssen@noxlogic.nl Find me for blogs: www.adayinthelifeof.nl