Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science and Machine Learning for dinner

Gousto Tech
November 08, 2017
890

Data Science and Machine Learning for dinner

Gousto Tech

November 08, 2017
Tweet

Transcript

  1. Fueling family life through good food Simple model • You

    pick from 22 recipes each week • We deliver a box of wholesome ingredients in exact proportions with step-by-step recipe cards. • No planning, no supermarkets and no food waste – you just cook Leading proposition • Most choice (22 is just the start) • Most delivery options (7 days, with am, pm and evening slots) • Best price
  2. Team Dejan Head of Data Science Manuel Machine Learning Engineer

    Marc Data Scientist Irene Data Scientist
  3. Data is the voice of our customers Collect Everything Store

    Everything Expose Everything * * Whilst maintaining data security apps web microservices 3rd party airflow (ETL) amazon Redshift (data warehouse) amazon S3 (data archive) amazon DMS (data migration) (event logging) data scientists data products business users periscope (analytics and dashboards)
  4. Building innovative data products Start early Establish a data science

    capability from the beginning Build what the business needs Use agile product management and scrum to deliver value early Build production ready products We practice devops at Gousto - “You build it, you run it” Create focus Create a separate, dedicated data analytics function, all data insight queries go there Measure success through data Make it accountable to the investment being made 1 2 3 4 5
  5. Forecasting • Our short lead time means we can’t always

    purchase our ingredients against actual orders • We need to forecast total number of boxes and how many ingredients we need to buy • There are a lot of perishables: forecasting is key to being a sustainable business What • Predicted acquisition strength • Live customer trends • Retention (cohort analysis) • Seasonality (Facebook prophet) How Forecast Waste Average
  6. Menu 300 Menu 299 Menu 298 Week 1 Week 2

    Week 3 Forecasting For illustration purposes only
  7. Specific setup • Fresh items • SKUs and their quantities

    changing on a weekly basis • SKU quantities change on a weekly / daily basis • SKUs are placed on different locations every week • Optimise pickface AND finding shortest path to collect items Gousto • Mostly items with long shelf-life • SKUs are changing rarely (static) • SKU quantities are cyclical / seasonality • SKUs are placed on the same locations • Finding shortest path to collect items General e-commerce
  8. How do we do it? • Completely new menu each

    week (requires a weekly redesign of our warehouse) • First we do forecast to predict total orders per recipe, per day • Orders forecast is consumed by genetic algorithms to calculate: ◦ Optimal pick-face layout from billions of combinations to balance load across our pick stations ◦ Optimal picking order and line routing to reduce congestion at these pick stations • Wondering about results?
  9. Achievements • Minor continuous tweaks to support massive growth •

    Grouping recipe SKUs per station What • Throughput: 70% improvement • Initial results of recipe grouping indicates that we can reduce station visits from ~11 to ~8 Performance analysis
  10. • Factoring in picking behaviour (e.g. time to pick and

    item similarity) • Automate replenishment from a mezzanine floor • Image recognition for quality inspection What • Improving pick time and reduce picking errors • Improve the quality of work for pickers Expected outcome Plans
  11. Building a recommender system • Recommend recipes on the basis

    of what a user has ordered previously and what recipes have a high similarity in N-dim feature space Content-based Filtering • Recommend recipes on the basis of what a user has ordered / clicked on (implicit feedback) and what others did with high similarity Collaborative Filtering
  12. Approach • Generalization of matrix factorization and polynomial regression •

    Uses factorization to approximate pairwise (or higher order) interactions under data sparsity • Allows fitting on both interaction and item/user meta-data • Optimized with e.g. SGD Factorization Machines* Rendle, Steffen. “Factorization Machines with libFM.” ACM TIST 3 (2012) * Reference
  13. ‘Live’ validation • Collaborative recommendation engine (v1) • Personalised weekly

    menu reveal email • Treatment involved only recipes and their sequence! Simple Experiment • Click-conversion rate is ~10% higher • Basket-match rate is ~27% higher Performance analysis Click-conversion Basket-match rate
  14. Get user data Get recipe data Get orders data Train

    Model Recommend recipes (batch n) S3 / Redshift Personalisation Pipeline (Airflow) Recommend recipes (batch 2) Recommend recipes (batch 1) Encode orders data (batch 1) Encode orders data (batch 1) Encode orders data (batch n) ... ... SQS / Recipe Service
  15. Challenges ◦ Validation ( • Batching • Garbage Collection •

    Encapsulation • Multi-processing/threading • Optimise operations with low-level libraries • Get more resources Work with limited resources • Robustness (Reliability) • Validation (Performance) Integrate with business
  16. • Scalable, maintainable code • Distributed architecture / Auto scaling

    • TTD (Test Driven Development) • Incremental model update • Reusable “Data Science” modules Next Steps
  17. Open roles A growing team focused on delivering step changes

    in KPIs through deployment and stewardship of algorithms Team objective Q1 ‘17 Q3 ‘17 2018
  18. Internships PhD MPhil / MASt / M* UG (Final year)

    Data Science Project (12w summer project) Any quantitative Maths / OR Maths / OR CompSci CompBio/ Chem CompSci Econ