Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

szilard
July 20, 2019
81

Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019

szilard

July 20, 2019
Tweet

More Decks by szilard

Transcript

  1. Make Machine Learning Boring Again: Best
    Practices for Using Machine Learning in
    Businesses
    Szilard Pafka, PhD
    Chief Scientist, Epoch
    LA Data Science Meetup
    Aug 2019

    View full-size slide

  2. Disclaimer:
    I am not representing my employer (Epoch) in this talk
    I cannot confirm nor deny if Epoch is using any of the methods, tools,
    results etc. mentioned in this talk

    View full-size slide

  3. y = f (x1, x2, ... , xn)
    Source: Hastie etal, ESL 2ed

    View full-size slide

  4. y = f (x1, x2, ... , xn)

    View full-size slide

  5. #1 Use the Right Algo

    View full-size slide

  6. Source: Andrew Ng

    View full-size slide

  7. #2 Use Open Source

    View full-size slide

  8. in 2006
    - cost was not a factor!
    - data.frame
    - [800] packages

    View full-size slide

  9. #3 Simple > Complex

    View full-size slide

  10. #4 Incorporate Domain Knowledge
    Do Feature Engineering (Still)
    Explore Your Data
    Clean Your Data

    View full-size slide

  11. #5 Do Proper Validation
    Avoid: Overfitting, Data Leakage

    View full-size slide

  12. #6 Batch or Real-Time Scoring?

    View full-size slide

  13. https://medium.com/@HarlanH/patterns-for-connecting-predictive-models-to-software-products-f9b6e923f02d

    View full-size slide

  14. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280
    your app

    View full-size slide

  15. R/Python:
    - Slow(er)
    - Encoding of categ. variables

    View full-size slide

  16. #7 Do Online Validation as Well

    View full-size slide

  17. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

    View full-size slide

  18. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation

    View full-size slide

  19. https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation
    https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-recommendations-feature-engineering-with-time-travel

    View full-size slide

  20. #8 Monitor Your Models

    View full-size slide

  21. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

    View full-size slide

  22. https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/

    View full-size slide

  23. 20%
    80%
    (my guess)

    View full-size slide

  24. 20%
    80%
    (my guess)

    View full-size slide

  25. #9 Business Value
    Seek / Measure / Sell

    View full-size slide

  26. #10 Make it Reproducible

    View full-size slide

  27. Cloud (servers)

    View full-size slide

  28. ML training:
    lots of CPU cores
    lots of RAM
    limited time

    View full-size slide

  29. ML training:
    lots of CPU cores
    lots of RAM
    limited time
    ML scoring:
    separated servers

    View full-size slide

  30. ML (cloud) services (MLaaS)

    View full-size slide

  31. “people that know what they’re doing just
    use open source [...] the same open
    source tools that the MLaaS services offer”
    - Bradford Cross

    View full-size slide

  32. already pre-processed data
    less domain knowledge
    (or deliberately hidden)
    AUC 0.0001 increases "relevant"
    no business metric
    no actual deployment
    models too complex
    no online evaluation
    no monitoring
    data leakage

    View full-size slide

  33. Tuning and Auto ML

    View full-size slide

  34. Ben Recht, Kevin Jamieson: http://www.argmin.net/2016/06/20/hypertuning/

    View full-size slide

  35. Aggregation 100M rows 1M groups
    Join 100M rows x 1M rows
    time [s]
    time [s]

    View full-size slide

  36. Aggregation 100M rows 1M groups
    Join 100M rows x 1M rows
    time [s]
    time [s]
    “Motherfucka!”

    View full-size slide

  37. API and GUIs

    View full-size slide

  38. How to Start?

    View full-size slide