Avoiding ML FOBO - PyDataBerlin 2019

Avoiding ML FOBO Rachel Berryman, Dânia Meira PyData Berlin 2019

Who we are Rachel Manager Data Science & Analytics Bridging
the gap between code and customers Teacher at a Data Science Bootcamp Dânia Senior Data Scientist ML models for predictive analytics Volunteer at DSSG Berlin Teacher at a Data Science Bootcamp The AI Guild is the go-to community for Data Scientists, Data Engineers, Machine Learners, and Deep Learners accelerating the adoption of AI in Europe

FOBO: Fear Of Better Options What it looks like: •
Always optimizing • Living in a world of maybes • Paralyzed at the prospect of actually committing to something • Fear to be choosing something that isn’t the absolutely perfect option The reality: → We'll never be able to examine every possible algorithm, package, tool and/or technology before making a decision.

Machine Learning Algorithms https://vas3k.com/blog/machine_learning/

Algorithms in scikit-learn https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

https://data-ﬂair.training/blogs/deep-learning-with-python-libraries/ Deep Learning python libraries

Why do Data Scientists have FOBO? What causes it? Data
Science as a ﬁeld breeds FOBO because of how fast the ﬁeld is growing and progressing. This means as a Data Scientist, you will always have gaps in your knowledge base. • Uncomfortable to feel like you don’t have all the answers. • New Data Scientists, especially those fresh out of academia or research, tend to focus on technology and theory, not on application. • Managers also focus on the hype (Why aren’t we doing deep learning? What is our ML “IP”?)

I’m overwhelmed O.O The ﬁrst solutions that come to mind
are often the “most-hyped” options although those are not always the best ﬁtting ones. • How should we decide what to use? • What if we make the wrong decision and don't bring enough value? • What if what we choose to use isn't "state-of-the-art"?

Don’t despair ! We will present a practical roadmap to
guide your Data Science projects: • What to focus on ﬁrst • How to choose algorithms - and how to iterate on them • Tips for convincing business leaders to focus on what works, not on the hype

What to focus on ﬁrst https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

What to focus on ﬁrst https://hackernoon.com/-big-challenge-in-deep-learning-training-data-31a88b97b282

ML is only a (small) part of the final solution
See also: “Hidden Technical Debt in Machine Learning System” by Sculley et al, Google inc, 2015 ML Models Data Collection Data Quality Infrastructure Process Management Tools Machine Resource Management Monitoring Configuration Feature Extraction Analysis Data Preprocessing Parameter Configuration Offline Validation Business Logic A/B Testing Data Science Environment

What to focus on ﬁrst Understand the needs and be
clear on: ...But, how to get there? What you want to achieve Where you are now

How to choose algorithms - and how to iterate on
them → Seeing the ML workflow from the perspective of iteration helps to understand the big picture concepts behind machine learning Define Goal Evaluate Model Prepare Data Collect Data Train Model Deploy Model Make Predictions Monitor Predictions Gather and Analyze Insights Define MVP

them • Fail-fast attitude: Debug, iterate, experiment ◦ You'll better understand the algorithms you work with. ◦ You'll anticipate more realistic timelines for your projects. ◦ You'll spot low hanging fruit for model improvement. ◦ You'll ﬁnd it easier to stay motivated after poor initial results. ◦ You'll be able to solve bigger problems with machine learning.

“Lazy Estimator” Baseline: First Step • Lazy Regressor - Always
predict mean/median • Lazy Classiﬁer - Always predict most common class → Any ML model used should signiﬁcantly outperform the Lazy Estimator.

them • Often, the ﬁrst model you build for a problem won't be the best possible. • Trying Different Model Families ◦ “No Free Lunch”: There is no one model family that works best for every problem. https://elitedatascience.com/machine-learning-iteration

Why not starting with NNs ? Why Deep Learning? Slide
by Andrew Ng, all rights reserved. Labeled How to choose algorithms - and how to iterate on them

them • Trying Ensemble of Models ◦ Combining multiple models often leads to a small performance increase over any of the individual models https://elitedatascience.com/machine-learning-iteration

them Fail-fast attitude: debug, iterate, experiment 1. Start with a simple baseline model or rule-based solution 2. Develop a very simple ML algorithm (logistic regression for example) 3. Add incrementally: ◦ Collect better data, engineer better features ◦ Try different hyper-parameters, algorithms, ensembles ◦ Compare the results

Tips for convincing business leaders to focus on what works,
not on the hype • Deﬁne Use Case • Integrate with other teams • Set expectations From https://pair.withgoogle.com /worksheet/user-needs.pdf

Building FOBO-Free Execution Teams

AI Project Canvas https://towardsdatascience.com/introducing-the-ai-project-canvas-e88e29eb7024

Tips for convincing business leaders to focus on what works,
not on the hype • Data Science Hierarchy of Needs ◦ When companies try to apply AI they have all the tools to train neural networks but no tools to develop training data. • Deﬁne Use Case: MVP Approach ◦ Start with a small, vertical section of your product and you make it work well end-to-end • AI Project Canvas • Build Cross-functional execution team

Summary For Data Scientists • Start with a solution that
delivers value • Communicate with stakeholders • Integrate with other teams • Set expectations • Experiment and Iterate For Decision Makers • ML Workflow ≠ SW Workflow • Define use case and metrics • Integrate with other teams • Set expectations • Experiment and Iterate

Thank you! https://www.theguild.ai www.linkedin.com/in/rachelberryman www.linkedin.com/in/daniameira

Avoiding ML FOBO - PyDataBerlin 2019

Avoiding ML FOBO - PyDataBerlin 2019

Dânia Meira

More Decks by Dânia Meira

Other Decks in Technology

Featured

Transcript

Avoiding ML FOBO Rachel Berryman, Dânia Meira PyData Berlin 2019

Who we are Rachel Manager Data Science & Analytics Bridging

FOBO: Fear Of Better Options What it looks like: •

Machine Learning Algorithms https://vas3k.com/blog/machine_learning/

Algorithms in scikit-learn https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

https://data-ﬂair.training/blogs/deep-learning-with-python-libraries/ Deep Learning python libraries

Why do Data Scientists have FOBO? What causes it? Data

I’m overwhelmed O.O The ﬁrst solutions that come to mind

Don’t despair ! We will present a practical roadmap to

What to focus on ﬁrst https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

What to focus on ﬁrst https://hackernoon.com/-big-challenge-in-deep-learning-training-data-31a88b97b282

ML is only a (small) part of the ﬁnal solution

What to focus on ﬁrst Understand the needs and be

How to choose algorithms - and how to iterate on

How to choose algorithms - and how to iterate on

“Lazy Estimator” Baseline: First Step • Lazy Regressor - Always

How to choose algorithms - and how to iterate on

Why not starting with NNs ? Why Deep Learning? Slide

Results Get Better With More Data, Larger Models, More Compute

How to choose algorithms - and how to iterate on

How to choose algorithms - and how to iterate on

Tips for convincing business leaders to focus on what works,

Building FOBO-Free Execution Teams

AI Project Canvas https://towardsdatascience.com/introducing-the-ai-project-canvas-e88e29eb7024

Tips for convincing business leaders to focus on what works,

Summary For Data Scientists • Start with a solution that

Thank you! https://www.theguild.ai www.linkedin.com/in/rachelberryman www.linkedin.com/in/daniameira