Data Science and Machine Learning for dinner

Fueling family life through good food Simple model • You
pick from 22 recipes each week • We deliver a box of wholesome ingredients in exact proportions with step-by-step recipe cards. • No planning, no supermarkets and no food waste – you just cook Leading proposition • Most choice (22 is just the start) • Most delivery options (7 days, with am, pm and evening slots) • Best price

Team Dejan Head of Data Science Manuel Machine Learning Engineer
Marc Data Scientist Irene Data Scientist

Our journey so far 240% growth per year

The beginning

Turning into

Data is the voice of our customers Collect Everything Store
Everything Expose Everything * * Whilst maintaining data security apps web microservices 3rd party airflow (ETL) amazon Redshift (data warehouse) amazon S3 (data archive) amazon DMS (data migration) (event logging) data scientists data products business users periscope (analytics and dashboards)

Building innovative data products Start early Establish a data science
capability from the beginning Build what the business needs Use agile product management and scrum to deliver value early Build production ready products We practice devops at Gousto - “You build it, you run it” Create focus Create a separate, dedicated data analytics function, all data insight queries go there Measure success through data Make it accountable to the investment being made 1 2 3 4 5

Projects

Marketing Attribution Automated Stock Manipulation Forecasting Warehouse Optimisation Personalisation AD
Data science all along the user journey

Forecasting

Forecasting • Our short lead time means we can’t always
purchase our ingredients against actual orders • We need to forecast total number of boxes and how many ingredients we need to buy • There are a lot of perishables: forecasting is key to being a sustainable business What • Predicted acquisition strength • Live customer trends • Retention (cohort analysis) • Seasonality (Facebook prophet) How Forecast Waste Average

Menu 300 Menu 299 Menu 298 Week 1 Week 2
Week 3 Forecasting For illustration purposes only

For illustration purposes only

Process Forecast Purchase Final Orders Delivery

Process Forecast Purchase Final Orders Delivery Re-forecast Adjust Purchases

Warehouse optimisation

Specific setup • Fresh items • SKUs and their quantities
changing on a weekly basis • SKU quantities change on a weekly / daily basis • SKUs are placed on different locations every week • Optimise pickface AND finding shortest path to collect items Gousto • Mostly items with long shelf-life • SKUs are changing rarely (static) • SKU quantities are cyclical / seasonality • SKUs are placed on the same locations • Finding shortest path to collect items General e-commerce

How do we do it? • Completely new menu each
week (requires a weekly redesign of our warehouse) • First we do forecast to predict total orders per recipe, per day • Orders forecast is consumed by genetic algorithms to calculate: ◦ Optimal pick-face layout from billions of combinations to balance load across our pick stations ◦ Optimal picking order and line routing to reduce congestion at these pick stations • Wondering about results?

Achievements • Minor continuous tweaks to support massive growth •
Grouping recipe SKUs per station What • Throughput: 70% improvement • Initial results of recipe grouping indicates that we can reduce station visits from ~11 to ~8 Performance analysis

• Factoring in picking behaviour (e.g. time to pick and
item similarity) • Automate replenishment from a mezzanine floor • Image recognition for quality inspection What • Improving pick time and reduce picking errors • Improve the quality of work for pickers Expected outcome Plans

Personalisation

Revolutionising family dinners in the UK Personalise Customise Menu Communication
Premium recipes Add-ons (e.g. wine)

Building a recommender system • Recommend recipes on the basis
of what a user has ordered previously and what recipes have a high similarity in N-dim feature space Content-based Filtering • Recommend recipes on the basis of what a user has ordered / clicked on (implicit feedback) and what others did with high similarity Collaborative Filtering

Approach • Generalization of matrix factorization and polynomial regression •
Uses factorization to approximate pairwise (or higher order) interactions under data sparsity • Allows fitting on both interaction and item/user meta-data • Optimized with e.g. SGD Factorization Machines* Rendle, Steffen. “Factorization Machines with libFM.” ACM TIST 3 (2012) * Reference

‘Live’ validation • Collaborative recommendation engine (v1) • Personalised weekly
menu reveal email • Treatment involved only recipes and their sequence! Simple Experiment • Click-conversion rate is ~10% higher • Basket-match rate is ~27% higher Performance analysis Click-conversion Basket-match rate

Get user data Get recipe data Get orders data Train
Model Recommend recipes (batch n) S3 / Redshift Personalisation Pipeline (Airflow) Recommend recipes (batch 2) Recommend recipes (batch 1) Encode orders data (batch 1) Encode orders data (batch 1) Encode orders data (batch n) ... ... SQS / Recipe Service

Challenges ◦ Validation ( • Batching • Garbage Collection •
Encapsulation • Multi-processing/threading • Optimise operations with low-level libraries • Get more resources Work with limited resources • Robustness (Reliability) • Validation (Performance) Integrate with business

• Scalable, maintainable code • Distributed architecture / Auto scaling
• TTD (Test Driven Development) • Incremental model update • Reusable “Data Science” modules Next Steps

Opportunities

Open roles A growing team focused on delivering step changes
in KPIs through deployment and stewardship of algorithms Team objective Q1 ‘17 Q3 ‘17 2018

Internships PhD MPhil / MASt / M* UG (Final year)
Data Science Project (12w summer project) Any quantitative Maths / OR Maths / OR CompSci CompBio/ Chem CompSci Econ

Data Science and Machine Learning for dinner

Data Science and Machine Learning for dinner

Gousto Tech

More Decks by Gousto Tech

Featured

Transcript