Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

Ce8e94cc306ba164175f693fb01aa8b0?s=47 szilard
July 20, 2019
15

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

Ce8e94cc306ba164175f693fb01aa8b0?s=128

szilard

July 20, 2019
Tweet

Transcript

  1. 1.

    Make Machine Learning Boring Again: Best Practices for Using Machine

    Learning in Businesses Szilard Pafka, PhD Chief Scientist, Epoch LA Data Science Meetup Aug 2019
  2. 2.
  3. 3.

    Disclaimer: I am not representing my employer (Epoch) in this

    talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 11.
  10. 12.
  11. 13.
  12. 14.
  13. 17.
  14. 18.
  15. 19.
  16. 20.
  17. 21.
  18. 22.
  19. 23.
  20. 24.
  21. 25.
  22. 26.
  23. 27.
  24. 28.
  25. 29.
  26. 30.
  27. 31.
  28. 32.
  29. 33.
  30. 34.

    *

  31. 36.
  32. 37.
  33. 38.
  34. 39.
  35. 40.
  36. 42.
  37. 43.
  38. 44.
  39. 45.
  40. 46.
  41. 47.
  42. 48.
  43. 50.
  44. 51.

    10x

  45. 52.
  46. 53.
  47. 54.
  48. 55.
  49. 56.
  50. 57.
  51. 58.
  52. 59.
  53. 61.
  54. 62.
  55. 63.
  56. 64.
  57. 65.
  58. 66.
  59. 67.
  60. 68.
  61. 69.
  62. 70.
  63. 71.
  64. 73.
  65. 74.
  66. 75.
  67. 76.
  68. 77.
  69. 78.
  70. 79.
  71. 80.
  72. 81.
  73. 82.
  74. 83.
  75. 84.
  76. 85.
  77. 86.
  78. 88.
  79. 91.
  80. 92.
  81. 95.
  82. 100.
  83. 103.
  84. 107.
  85. 108.
  86. 109.
  87. 110.
  88. 111.
  89. 113.
  90. 114.
  91. 115.
  92. 116.
  93. 117.
  94. 118.
  95. 119.
  96. 120.
  97. 121.
  98. 124.

    ML training: lots of CPU cores lots of RAM limited

    time ML scoring: separated servers
  99. 126.
  100. 127.

    “people that know what they’re doing just use open source

    [...] the same open source tools that the MLaaS services offer” - Bradford Cross
  101. 128.
  102. 129.
  103. 130.

    already pre-processed data less domain knowledge (or deliberately hidden) AUC

    0.0001 increases "relevant" no business metric no actual deployment models too complex no online evaluation no monitoring data leakage
  104. 133.
  105. 135.

    Aggregation 100M rows 1M groups Join 100M rows x 1M

    rows time [s] time [s] “Motherfucka!”
  106. 136.
  107. 138.
  108. 139.
  109. 140.

    AI?

  110. 141.
  111. 142.
  112. 143.
  113. 145.
  114. 146.