Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - Albuquerque Machine Learning Meetup (Online) - Aug 2020

szilard
August 16, 2020
84

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - Albuquerque Machine Learning Meetup (Online) - Aug 2020

szilard

August 16, 2020
Tweet

More Decks by szilard

Transcript

  1. Make Machine Learning Boring Again: Best
    Practices for Using Machine Learning in
    Businesses
    Szilard Pafka, PhD
    Chief Scientist, Epoch
    Albuquerque Machine Learning Meetup (Online)
    Aug 2020

    View full-size slide

  2. Disclaimer:
    I am not representing my employer (Epoch) in this talk
    I cannot confirm nor deny if Epoch is using any of the methods, tools,
    results etc. mentioned in this talk

    View full-size slide

  3. y = f (x1, x2, ... , xn)
    Source: Hastie etal, ESL 2ed

    View full-size slide

  4. y = f (x1, x2, ... , xn)

    View full-size slide

  5. #1 Use the Right Algo

    View full-size slide

  6. Source: Andrew Ng

    View full-size slide

  7. #2 Use Open Source

    View full-size slide

  8. in 2006
    - cost was not a factor!
    - data.frame
    - [800] packages

    View full-size slide

  9. #3 Simple > Complex

    View full-size slide

  10. #4 Incorporate Domain Knowledge
    Do Feature Engineering (Still)
    Explore Your Data
    Clean Your Data

    View full-size slide

  11. #5 Do Proper Validation
    Avoid: Overfitting, Data Leakage

    View full-size slide

  12. #5+ Model Debugging
    Un-Black Boxing/Understanding,
    Interpretability, Fairness

    View full-size slide

  13. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day
    Readmission - Rich Caruana etal
    On one of the pneumonia datasets, the rule-based system learned the rule
    “HasAsthama(x) ⇒ LowerRisk(x)”, i.e., that patients who have a history of asthma have
    lower risk of dying from pneumonia than the general population
    patients with a history of asthma usually were admitted not only to the hospital but
    directly to the ICU (Intensive Care Unit). [...] the aggressive care received by asthmatic
    patients was so effective that it lowered their risk of dying from pneumonia compared to
    the general population
    models trained on the data incorrectly learn that asthma lowers risk, when in fact
    asthmatics have much higher risk (if not hospitalized)
    The logistic regression model also learned that having asthma lowered risk, but this
    could easily be corrected by changing the weight on the asthma feature from negative
    to positive (or to zero).

    View full-size slide

  14. #6 Batch or Real-Time Scoring?

    View full-size slide

  15. https://medium.com/@HarlanH/patterns-for-connecting-predictive-models-to-software-products-f9b6e923f02d

    View full-size slide

  16. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280
    your app

    View full-size slide

  17. R/Python:
    - Slow(er)
    - Encoding of categ. variables

    View full-size slide

  18. #7 Do Online Validation as Well

    View full-size slide

  19. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

    View full-size slide

  20. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

    View full-size slide

  21. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation
    https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-recommendations-feature-engineering-with-time-travel

    View full-size slide

  22. #8 Monitor Your Models

    View full-size slide

  23. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

    View full-size slide

  24. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

    View full-size slide

  25. 20%
    80%
    (my guess)

    View full-size slide

  26. 20%
    80%
    (my guess)

    View full-size slide

  27. #9 Business Value
    Seek / Measure / Sell

    View full-size slide

  28. #10 Make it Reproducible

    View full-size slide

  29. #11 Use the Cloud (Virtual Servers)

    View full-size slide

  30. ML training:
    lots of CPU cores
    lots of RAM
    limited time

    View full-size slide

  31. ML training:
    lots of CPU cores
    lots of RAM
    limited time
    ML scoring:
    separated servers

    View full-size slide

  32. #12 Don’t Use ML (cloud) services
    (MLaaS)

    View full-size slide

  33. “ the people that know what they’re doing just use open source, and the
    people that don’t will not get anything to work, ever, even with APIs.”
    https://bradfordcross.com/five-ai-startup-predictions-for-2017/

    View full-size slide

  34. #13 Use High-Level APIs
    but not GUIs

    View full-size slide

  35. #14 Kaggle Doesn’t Matter (Mostly)

    View full-size slide

  36. already pre-processed data
    less domain knowledge
    (or deliberately hidden)
    AUC 0.0001 increases "relevant"
    no business metric
    no actual deployment
    models too complex
    no online evaluation
    no monitoring
    data leakage

    View full-size slide

  37. # 15 GPUs (Depends)

    View full-size slide

  38. Aggregation 100M rows 1M groups
    Join 100M rows x 1M rows
    time [s]
    time [s]

    View full-size slide

  39. Aggregation 100M rows 1M groups
    Join 100M rows x 1M rows
    time [s]
    time [s]
    “Motherfucka!”

    View full-size slide

  40. #16 Tuning and Auto ML (Depends)

    View full-size slide

  41. Ben Recht, Kevin Jamieson: http://www.argmin.net/2016/06/20/hypertuning/

    View full-size slide

  42. https://arxiv.org/pdf/1907.00909.pdf

    View full-size slide

  43. “There is no AutoML system which consistently
    outperforms all others. On some datasets, the performance
    differences can be significant, but on others the AutoML
    methods are only marginally better than a Random Forest.
    On 2 datasets, all frameworks perform worse than a
    Random Forest.”

    View full-size slide

  44. Winner stability in data
    science competitions
    Test Set N=100K, Models M=1000

    View full-size slide

  45. Winner stability in data
    science competitions
    Test Set N=100K, Models M=3000

    View full-size slide

  46. Winner stability in data
    science competitions
    Test Set N=10K, Models M=1000

    View full-size slide

  47. Winner stability in data
    science competitions
    Test Set N=10K, Models M=3000

    View full-size slide

  48. Meta: Ignore the Hype

    View full-size slide

  49. How to Start?

    View full-size slide