Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

Ce8e94cc306ba164175f693fb01aa8b0?s=47 szilard
July 20, 2019
17

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

Ce8e94cc306ba164175f693fb01aa8b0?s=128

szilard

July 20, 2019
Tweet

Transcript

  1. Make Machine Learning Boring Again: Best Practices for Using Machine

    Learning in Businesses Szilard Pafka, PhD Chief Scientist, Epoch LA Data Science Meetup Aug 2019
  2. None
  3. Disclaimer: I am not representing my employer (Epoch) in this

    talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
  4. None
  5. None
  6. None
  7. None
  8. None
  9. y = f (x1, x2, ... , xn) Source: Hastie

    etal, ESL 2ed
  10. y = f (x1, x2, ... , xn)

  11. None
  12. None
  13. None
  14. None
  15. #1 Use the Right Algo

  16. Source: Andrew Ng

  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. *

  35. #2 Use Open Source

  36. None
  37. None
  38. None
  39. None
  40. None
  41. in 2006 - cost was not a factor! - data.frame

    - [800] packages
  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. #3 Simple > Complex

  50. None
  51. 10x

  52. None
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. None
  60. #4 Incorporate Domain Knowledge Do Feature Engineering (Still) Explore Your

    Data Clean Your Data
  61. None
  62. None
  63. None
  64. None
  65. None
  66. None
  67. None
  68. None
  69. None
  70. None
  71. None
  72. #5 Do Proper Validation Avoid: Overfitting, Data Leakage

  73. None
  74. None
  75. None
  76. None
  77. None
  78. None
  79. None
  80. None
  81. None
  82. None
  83. None
  84. None
  85. None
  86. None
  87. #6 Batch or Real-Time Scoring?

  88. None
  89. https://medium.com/@HarlanH/patterns-for-connecting-predictive-models-to-software-products-f9b6e923f02d

  90. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280 your app

  91. None
  92. None
  93. R/Python: - Slow(er) - Encoding of categ. variables

  94. #7 Do Online Validation as Well

  95. None
  96. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

  97. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

  98. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-recommendations-feature-engineering-with-time-travel

  99. #8 Monitor Your Models

  100. None
  101. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

  102. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

  103. None
  104. 20% 80% (my guess)

  105. 20% 80% (my guess)

  106. #9 Business Value Seek / Measure / Sell

  107. None
  108. None
  109. None
  110. None
  111. None
  112. #10 Make it Reproducible

  113. None
  114. None
  115. None
  116. None
  117. None
  118. None
  119. None
  120. None
  121. None
  122. Cloud (servers)

  123. ML training: lots of CPU cores lots of RAM limited

    time
  124. ML training: lots of CPU cores lots of RAM limited

    time ML scoring: separated servers
  125. ML (cloud) services (MLaaS)

  126. None
  127. “people that know what they’re doing just use open source

    [...] the same open source tools that the MLaaS services offer” - Bradford Cross
  128. Kaggle

  129. None
  130. already pre-processed data less domain knowledge (or deliberately hidden) AUC

    0.0001 increases "relevant" no business metric no actual deployment models too complex no online evaluation no monitoring data leakage
  131. Tuning and Auto ML

  132. Ben Recht, Kevin Jamieson: http://www.argmin.net/2016/06/20/hypertuning/

  133. GPUs

  134. Aggregation 100M rows 1M groups Join 100M rows x 1M

    rows time [s] time [s]
  135. Aggregation 100M rows 1M groups Join 100M rows x 1M

    rows time [s] time [s] “Motherfucka!”
  136. None
  137. API and GUIs

  138. None
  139. None
  140. AI?

  141. None
  142. None
  143. None
  144. How to Start?

  145. None
  146. None