Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Avoiding ML FOBO - PyDataBerlin 2019

Avoiding ML FOBO - PyDataBerlin 2019

Presenters: Rachel Berryman, Dânia Meira
FOMO is the fear of missing out. FOBO is similar- the fear of a better option. FOBO gives a name to that spiral we fall into when we obsessively research every possible option when faced with a decision, fearing we’ll miss out on the “best” one. When starting a new machine learning project, just the thought and the reality that we'll never be able to examine every possible algorithm, package, tool and/or technology before making a decision can be overwhelming and it can easily block us. What if we make the wrong decision and don't bring enough value? What if what we choose to use isn't "state-of-the-art"? The first solutions that come to mind are often the “most-hyped” options, for example DL, although those are not always the best fitting ones. How should you decide what to use?

We will present a practical roadmap to guide your Data Science projects: What to focus on first (probably, it’s cleaning data and feature engineering), which algorithms to try first (hint: not NNs!!) and tips for convincing business leaders to focus on what works, not on the hype.

Dânia Meira

October 10, 2019
Tweet

More Decks by Dânia Meira

Other Decks in Technology

Transcript

  1. Who we are Rachel Manager Data Science & Analytics Bridging

    the gap between code and customers Teacher at a Data Science Bootcamp Dânia Senior Data Scientist ML models for predictive analytics Volunteer at DSSG Berlin Teacher at a Data Science Bootcamp The AI Guild is the go-to community for Data Scientists, Data Engineers, Machine Learners, and Deep Learners accelerating the adoption of AI in Europe
  2. FOBO: Fear Of Better Options What it looks like: •

    Always optimizing • Living in a world of maybes • Paralyzed at the prospect of actually committing to something • Fear to be choosing something that isn’t the absolutely perfect option The reality: → We'll never be able to examine every possible algorithm, package, tool and/or technology before making a decision.
  3. Why do Data Scientists have FOBO? What causes it? Data

    Science as a field breeds FOBO because of how fast the field is growing and progressing. This means as a Data Scientist, you will always have gaps in your knowledge base. • Uncomfortable to feel like you don’t have all the answers. • New Data Scientists, especially those fresh out of academia or research, tend to focus on technology and theory, not on application. • Managers also focus on the hype (Why aren’t we doing deep learning? What is our ML “IP”?)
  4. I’m overwhelmed O.O The first solutions that come to mind

    are often the “most-hyped” options although those are not always the best fitting ones. • How should we decide what to use? • What if we make the wrong decision and don't bring enough value? • What if what we choose to use isn't "state-of-the-art"?
  5. Don’t despair ! We will present a practical roadmap to

    guide your Data Science projects: • What to focus on first • How to choose algorithms - and how to iterate on them • Tips for convincing business leaders to focus on what works, not on the hype
  6. ML is only a (small) part of the final solution

    See also: “Hidden Technical Debt in Machine Learning System” by Sculley et al, Google inc, 2015 ML Models Data Collection Data Quality Infrastructure Process Management Tools Machine Resource Management Monitoring Configuration Feature Extraction Analysis Data Preprocessing Parameter Configuration Offline Validation Business Logic A/B Testing Data Science Environment
  7. What to focus on first Understand the needs and be

    clear on: ...But, how to get there? What you want to achieve Where you are now
  8. How to choose algorithms - and how to iterate on

    them → Seeing the ML workflow from the perspective of iteration helps to understand the big picture concepts behind machine learning Define Goal Evaluate Model Prepare Data Collect Data Train Model Deploy Model Make Predictions Monitor Predictions Gather and Analyze Insights Define MVP
  9. How to choose algorithms - and how to iterate on

    them • Fail-fast attitude: Debug, iterate, experiment ◦ You'll better understand the algorithms you work with. ◦ You'll anticipate more realistic timelines for your projects. ◦ You'll spot low hanging fruit for model improvement. ◦ You'll find it easier to stay motivated after poor initial results. ◦ You'll be able to solve bigger problems with machine learning.
  10. “Lazy Estimator” Baseline: First Step • Lazy Regressor - Always

    predict mean/median • Lazy Classifier - Always predict most common class → Any ML model used should significantly outperform the Lazy Estimator.
  11. How to choose algorithms - and how to iterate on

    them • Often, the first model you build for a problem won't be the best possible. • Trying Different Model Families ◦ “No Free Lunch”: There is no one model family that works best for every problem. https://elitedatascience.com/machine-learning-iteration
  12. Why not starting with NNs ? Why Deep Learning? Slide

    by Andrew Ng, all rights reserved. Labeled How to choose algorithms - and how to iterate on them
  13. Results Get Better With More Data, Larger Models, More Compute

    Slide by Jeff Dean, All Rights Reserved. When NNs make sense? How to choose algorithms - and how to iterate on them
  14. How to choose algorithms - and how to iterate on

    them • Trying Ensemble of Models ◦ Combining multiple models often leads to a small performance increase over any of the individual models https://elitedatascience.com/machine-learning-iteration
  15. How to choose algorithms - and how to iterate on

    them Fail-fast attitude: debug, iterate, experiment 1. Start with a simple baseline model or rule-based solution 2. Develop a very simple ML algorithm (logistic regression for example) 3. Add incrementally: ◦ Collect better data, engineer better features ◦ Try different hyper-parameters, algorithms, ensembles ◦ Compare the results
  16. Tips for convincing business leaders to focus on what works,

    not on the hype • Define Use Case • Integrate with other teams • Set expectations From https://pair.withgoogle.com /worksheet/user-needs.pdf
  17. Tips for convincing business leaders to focus on what works,

    not on the hype • Data Science Hierarchy of Needs ◦ When companies try to apply AI they have all the tools to train neural networks but no tools to develop training data. • Define Use Case: MVP Approach ◦ Start with a small, vertical section of your product and you make it work well end-to-end • AI Project Canvas • Build Cross-functional execution team
  18. Summary For Data Scientists • Start with a solution that

    delivers value • Communicate with stakeholders • Integrate with other teams • Set expectations • Experiment and Iterate For Decision Makers • ML Workflow ≠ SW Workflow • Define use case and metrics • Integrate with other teams • Set expectations • Experiment and Iterate