Demystifying the Buzz in Machine Learning! (This Time for Real)

Demystifying the Buzz in Machine Learning! (This time for real)
Dat Tran - Head of AI 17 june 2019 ~ berlin ~ 1 #AxelSpringerAI

echo $(whoami)

echo $(whoami) Face2Face Hydroplaning Prediction Image Quality Assessment

echo $(whoami)

Let me start with...

Let me start with Gartner Hype Cycle… ...Why because they’re
“always” right

Machine Learning

8 Guidelines for successful and realistic data projects

1. Think simple first and then, if it’s really needed,
get more complex

Minimum Viable Model Not like this… Like this!

Sales Prediction

Problem Statement • For over 50% of the lead-outs, we
don’t know whether users bought or not • We know it for Amazon & ebay but with a 2-days lag; other problems are direct vs. indirect sales • Predicting sales is valuable, for example for CRM, recommendation engine and many other use cases

Supervised Learning Samples price: 80, pis: 5, ... sale price:
5, pis: 1, ... non-sale price: 17, pis: 3, ... sale ML Model training Predictions price: 99, pis: 8, ... non-sale price: 65, pis: 2, ... sale (82%) price: 32, pis: 9, ... sale (30%) price: 40, pis: 5, ... sale (50%) price: 20, pis: 2, ... sale (71%) Deep Learning????

Interpretation of your models matters!

2. Define your data product MVP and release as early
as possible

MVP for Recommendations Not like this… Like this!

Classifying Hotel Aesthetics Photos

Problem Statement • 2.306.658 accommodations • 308.519.299 images • ~
133 images per accommodation Humans? Deep Learning??

How to start a Deep Learning project 1. Computer Vision:
ImageNet, AlexNet 2. NLP: Language models (still immature)

Automate Image Quality Assessment To automate the image quality assessment
we trained: • Aesthetic model → Predicts aesthetic score of an image • Technical model → Predicts the technical image quality (distortion, blur, etc.) We followed the Google paper “NIMA: Neural Image Assessment” published 09/2017

Results - First Iteration Aesthetic model - MobileNet Linear correlation
coefficient (LCC): 0.5987 Spearman's correlation coefficient (SCRR): 0.6072 Earth Mover's Distance: 0.2018 Accuracy (threshold at 5): 0.74

Example - First Iteration Aesthetic model - MobileNet

Learnings • First results are not good but we only
learned it because we released it ◦ More domain specific data is needed • We could load test our applications which is very valuable ◦ Used MobileNet instead of VGG-16

Second Iteration • We built a simple labeling application •
~ 12 people from idealo Reise and Data Science labeled ◦ 1000 hotel images for aesthetics ◦ 3000 hotel images for technical quality • We fine-tuned the aesthetic model with 800 training images • Built aesthetic test dataset with 200 images

Example - Second Iteration Aesthetic model - MobileNet

3. Creating data products is a team sport

UX/UI + Frontend engineer Backend engineer Data Scientist + Data
Engineer

Google’s Smart Reply Feature Apple’s Smart Photo Search Feature

4. Use the right tool for the right problem

This is our tech stack... only an extract;) PyData Deep
Learning Big Data Computer Vision NLP Production Machine Learning Visualization Data Preparation

5. Use the cloud

Minimum Viable Platform Not like this… Like this!

Use the cloud!

6. Measure your model and improve it from time to
time

Hotel Image Tagging Pipeline Day 1 Bedroom Bedroom Bedroom

Hotel Image Tagging Pipeline Day 2 Bedroom Bedroom Reception???

• Data changes constantly so monitor your model performance on
a regular basis • Re-training pipeline is also important • Don’t do it manually, use appropriate tools for this e.g. Apache Airflow Learnings

7. Your results need to be reproducible

Data Science Product Life Cycle Feature Engineering Modeling Evaluation Operationalization
Feedback Data Review API Design Problem Definition

• Use git • Dockerized aka containerized everything • Use
conda and/or pip for package management • Automatic pipeline management (testing, data) • TDD & API First strategy (everything as a Microservice) • Don’t use Jupyter notebooks for production system Learnings

8. Prioritize the projects with the biggest business impact

2 x 2 Business Impact vs. Technical Feasibility

Summary

1. Think simple first and then, if it’s really needed,
get more complex 2. Define your data product MVP and release as early as possible 3. Creating data products is a team sport 4. Use the right tool for the right problem 5. Use the cloud 6. Measure your model and improve it from time to time 7. Your results need to be reproducible 8. Prioritize the projects with the biggest business impact Summary

Questions? Url: www.dat-tran.com Twitter: @datitran

Demystifying the Buzz in Machine Learning! (Thi...

Demystifying the Buzz in Machine Learning! (This Time for Real)

More Decks by Dat Tran

Other Decks in Technology

Featured

Transcript