Slide 1

Slide 1 text

Data Science and Machine Learning for dinner

Slide 2

Slide 2 text

Fueling family life through good food Simple model ● You pick from 22 recipes each week ● We deliver a box of wholesome ingredients in exact proportions with step-by-step recipe cards. ● No planning, no supermarkets and no food waste – you just cook Leading proposition ● Most choice (22 is just the start) ● Most delivery options (7 days, with am, pm and evening slots) ● Best price

Slide 3

Slide 3 text

Team Dejan Head of Data Science Manuel Machine Learning Engineer Marc Data Scientist Irene Data Scientist

Slide 4

Slide 4 text

Our journey so far 240% growth per year

Slide 5

Slide 5 text

The beginning

Slide 6

Slide 6 text

Turning into

Slide 7

Slide 7 text

Data is the voice of our customers Collect Everything Store Everything Expose Everything * * Whilst maintaining data security apps web microservices 3rd party airflow (ETL) amazon Redshift (data warehouse) amazon S3 (data archive) amazon DMS (data migration) (event logging) data scientists data products business users periscope (analytics and dashboards)

Slide 8

Slide 8 text

Building innovative data products Start early Establish a data science capability from the beginning Build what the business needs Use agile product management and scrum to deliver value early Build production ready products We practice devops at Gousto - “You build it, you run it” Create focus Create a separate, dedicated data analytics function, all data insight queries go there Measure success through data Make it accountable to the investment being made 1 2 3 4 5

Slide 9

Slide 9 text

Projects

Slide 10

Slide 10 text

Marketing Attribution Automated Stock Manipulation Forecasting Warehouse Optimisation Personalisation AD Data science all along the user journey

Slide 11

Slide 11 text

Forecasting

Slide 12

Slide 12 text

Forecasting ● Our short lead time means we can’t always purchase our ingredients against actual orders ● We need to forecast total number of boxes and how many ingredients we need to buy ● There are a lot of perishables: forecasting is key to being a sustainable business What ● Predicted acquisition strength ● Live customer trends ● Retention (cohort analysis) ● Seasonality (Facebook prophet) How Forecast Waste Average

Slide 13

Slide 13 text

Menu 300 Menu 299 Menu 298 Week 1 Week 2 Week 3 Forecasting For illustration purposes only

Slide 14

Slide 14 text

For illustration purposes only

Slide 15

Slide 15 text

Process Forecast Purchase Final Orders Delivery

Slide 16

Slide 16 text

Process Forecast Purchase Final Orders Delivery Re-forecast Adjust Purchases

Slide 17

Slide 17 text

Warehouse optimisation

Slide 18

Slide 18 text

Specific setup ● Fresh items ● SKUs and their quantities changing on a weekly basis ● SKU quantities change on a weekly / daily basis ● SKUs are placed on different locations every week ● Optimise pickface AND finding shortest path to collect items Gousto ● Mostly items with long shelf-life ● SKUs are changing rarely (static) ● SKU quantities are cyclical / seasonality ● SKUs are placed on the same locations ● Finding shortest path to collect items General e-commerce

Slide 19

Slide 19 text

How do we do it? ● Completely new menu each week (requires a weekly redesign of our warehouse) ● First we do forecast to predict total orders per recipe, per day ● Orders forecast is consumed by genetic algorithms to calculate: ○ Optimal pick-face layout from billions of combinations to balance load across our pick stations ○ Optimal picking order and line routing to reduce congestion at these pick stations ● Wondering about results?

Slide 20

Slide 20 text

Achievements ● Minor continuous tweaks to support massive growth ● Grouping recipe SKUs per station What ● Throughput: 70% improvement ● Initial results of recipe grouping indicates that we can reduce station visits from ~11 to ~8 Performance analysis

Slide 21

Slide 21 text

● Factoring in picking behaviour (e.g. time to pick and item similarity) ● Automate replenishment from a mezzanine floor ● Image recognition for quality inspection What ● Improving pick time and reduce picking errors ● Improve the quality of work for pickers Expected outcome Plans

Slide 22

Slide 22 text

Personalisation

Slide 23

Slide 23 text

Revolutionising family dinners in the UK Personalise Customise Menu Communication Premium recipes Add-ons (e.g. wine)

Slide 24

Slide 24 text

Building a recommender system ● Recommend recipes on the basis of what a user has ordered previously and what recipes have a high similarity in N-dim feature space Content-based Filtering ● Recommend recipes on the basis of what a user has ordered / clicked on (implicit feedback) and what others did with high similarity Collaborative Filtering

Slide 25

Slide 25 text

Approach ● Generalization of matrix factorization and polynomial regression ● Uses factorization to approximate pairwise (or higher order) interactions under data sparsity ● Allows fitting on both interaction and item/user meta-data ● Optimized with e.g. SGD Factorization Machines* Rendle, Steffen. “Factorization Machines with libFM.” ACM TIST 3 (2012) * Reference

Slide 26

Slide 26 text

‘Live’ validation ● Collaborative recommendation engine (v1) ● Personalised weekly menu reveal email ● Treatment involved only recipes and their sequence! Simple Experiment ● Click-conversion rate is ~10% higher ● Basket-match rate is ~27% higher Performance analysis Click-conversion Basket-match rate

Slide 27

Slide 27 text

Get user data Get recipe data Get orders data Train Model Recommend recipes (batch n) S3 / Redshift Personalisation Pipeline (Airflow) Recommend recipes (batch 2) Recommend recipes (batch 1) Encode orders data (batch 1) Encode orders data (batch 1) Encode orders data (batch n) ... ... SQS / Recipe Service

Slide 28

Slide 28 text

Challenges ○ Validation ( ● Batching ● Garbage Collection ● Encapsulation ● Multi-processing/threading ● Optimise operations with low-level libraries ● Get more resources Work with limited resources ● Robustness (Reliability) ● Validation (Performance) Integrate with business

Slide 29

Slide 29 text

● Scalable, maintainable code ● Distributed architecture / Auto scaling ● TTD (Test Driven Development) ● Incremental model update ● Reusable “Data Science” modules Next Steps

Slide 30

Slide 30 text

Opportunities

Slide 31

Slide 31 text

Open roles A growing team focused on delivering step changes in KPIs through deployment and stewardship of algorithms Team objective Q1 ‘17 Q3 ‘17 2018

Slide 32

Slide 32 text

Internships PhD MPhil / MASt / M* UG (Final year) Data Science Project (12w summer project) Any quantitative Maths / OR Maths / OR CompSci CompBio/ Chem CompSci Econ