Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gousto use and love Snowplow

Gousto Tech
February 08, 2017

Gousto use and love Snowplow

Our journey of leveraging Snowplow Analytics.

Gousto Tech

February 08, 2017
Tweet

More Decks by Gousto Tech

Other Decks in Technology

Transcript

  1. Gousto USE SNOWPLOW Dejan Petelin Head of Data Science love

    — Our journey of leveraging Snowplow Analy9cs …
  2. •  An online recipe box service. •  Customers come to

    our site, or use our apps and select from 22 meals each week. •  They pick the meals they want to cook and say how many people they’re cooking for. •  We deliver all the ingredients they need in exact propor@ons with step-by-step recipe cards in 2-3 days. •  No planning, no supermarkets and no food waste – you just cook (and eat). •  We’re a rapidly growing business. About Gousto
  3. •  Transac@onal database and loads of external data sources, e.g.

    Excel spredsheets, 3rd party tools etc. •  Mul@ple ad-hoc analyses, mostly in Excel, which are difficult to update. •  Gap between web analy@cs (GA) and transac@onal data. •  Lack of customer event logs •  we started snapsho@ng transac@onal database. •  Loads of ques@ons from our CEO Timo :) Our data journey… MySQL Transac@onal Read Replica Mailchimp Excel spreadsheets Google Analy@cs Zendesk CRM Geo-demographic data CRONed data processing Ad-hoc analyses MySQL Data Warehouse Excel reports Stakeholders
  4. Growing data capabili9es Data Science Analy9cs Data Engineering •  As

    a subscrip@on service we are very retenDon focused – linking all the data sources is challenging. •  We believe that data is the voice of our customers, so we try to collect as much data as possible. •  Therefore we invested a lot in Snowplow as we own the data, which is very valuable asset and core of the business. •  The data is available to everyone – SQL is a great competency at Gousto.
  5. Our data stack Airflow (ETLs orchestra9on) Trans DB Data-warehouse (lake)

    Daily email reports Ad-hoc analyses Predic9ve modelling WMS
  6. Snowplow as unified log Customer Service AcDvity Log Service Order

    Service Product Service Recipe Service . . . AWS Lambda Amazon DynamoDB Platform Deployment Bucket SNS Subscribe to all messages Event API Amazon RedshiK AWS Lambda Subscribe to customer related messages
  7. Snowplow on isomorphic JS •  Shiny and super quick, but…

    what happened to my events?! •  No page loads – no automa@c page views. •  We developed our custom framework for triggering events. •  We use structured events for that purpose, but store (unstructured) JSON objects in them. •  Such approach allows us to be flexible and quickly introduce new events. •  But, no data valida@on can lead to garbage leaking. •  Data modelling in Redshi[. Client Server App API
  8. Moving to the real-9me pipeline – use case Snowplow 1

    5 Store acDon taken Churn model GiK service Process event 2 Events stream 4 If likely to churn 3 Store churn score •  Analyse customer behaviour in real- @me. •  Automa@cally react as soon as possible. •  Feed the response back to Snowplow (serving as a unified log). •  So the whole customer journey is available to CRM & reten@on teams instantly.
  9. From analy9cs to op9misa9on … •  Daily trading reports, e.g.

    signups by channel, conversion rate, orders etc. •  Analy@cs •  Customer behaviour •  Ac@onable insights •  Customer segmenta@on •  Marke@ng aaribu@on •  Channel mix op@misa@on •  Churn predic,on •  Automated menu design •  Warehouse op@misa@on •  Tracking performance Raw data Standard reports Op9misa9on Predic9ve modelling Generic predic9ve analy9cs Ad-hoc reports Source: Gartner Sense & Respond Predict & Act Complexity / Maturity Compe@@ve advantage
  10. Churn predic9on – intro •  As a subscrip@on service, we

    are very reten9on focused. •  Some customers are immediately convinced and become very loyal customers, while some customers need a bit more effort to get hooked. •  We use Snowplow events data to model customer behavior and find customers more likely to churn so we can focus on them. •  Use personalised approach to retain customers.
  11. Churn predic9on – how we did it Alice Bob Iterate

    over historic events to generate the training dataset. input data (features) outcomes (targets)
  12. Churn predic9on – piValls •  What churn actually is? How

    to define it? •  It might be beaer trying to predict the likelihood of customer placing an order. •  How big should be a horizon? Where should we draw a line? •  Using events data, there is almost unlimited number of features – how to find really informa@ve ones? •  How do we keep model up to date if we are affec@ng customer journeys? •  How to measure success? •  No maaer how accurate the model, the profit is what it counts at the end.
  13. Churn predic9on – future •  Predic@ng when the next event

    will happen, rather then probability of an event in the next X weeks. •  Using recursive (deep) neural networks (RNN) to model events recursively, rather than engineering features.
  14. Churn predic9on – results •  Accuracy of the model is

    ~80%. •  A bit too op@mis@c in the lower region and a bit too pessimis@c in higher region. •  Significant upli[ in the reten@on. •  Indeed, it depends on the ac@on taken. •  Loads of A/B tes@ng to find the right ac@ons to be taken. •  In the future, we want to build another model, sugges@ng what ac@on should be taken for each customer. •  Actually, why not build an autonomous system trying different approaches and communica@on channels to find the best approach? 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Actual propor+on Predicted likelihood control varia9on A varia9on B
  15. Automated menu design - intro •  The food team used

    to manually design menus – every week. •  With 22 recipes this task has become too demanding – diversity, mul@ple constraints, costs etc. •  They should be focusing on recipe development to keep delivering delicious recipes. •  Why not use machine learning to leverage the data to understand customers’ taste and design popular menus?
  16. Automated menu design – how it works (I) •  We

    developed a very detailed ontology to describe our recipes. •  We built an internal Slack bot to collect data on recipe similarity. •  Insights gathered with that data enabled us to provide diverse menus. •  Understanding customers’ taste is a crucial part of designing popular menus. •  Transac@onal data (orders) is not enough – Snowplow data gives us way more insights on how customers explore menus.
  17. Automated menu design – how it works (II) •  Mul@-objec@ve

    op@misa@on: •  Maximising recipe diversity •  Maximising menu popularity •  Balancing costs •  Matching forecasts •  Using Gene@c Algorithms (GA) •  Speed is not an issue as we have a whole week to generate new menu :) •  Mul@ple solu@ons so the food team can choose which menu best fit their objec@ves. Selec9on Cross-over Muta9on Evalua9on
  18. Concluding thoughts •  Snowplow has helped us to scale our

    data capabili@es with limited data engineering resources. •  TIP TO STARTUPS: start building data capabili9es as early as possible – data is a huge asset. •  Snowplow also serves us as a unified log. •  Not necessarily limited to customer focused data. •  Snowplow enables us to ‘listen’ to our customers and provide them more personalised experience. •  Moving to the real-Dme pipeline to realise just-in-@me personalisa@on, e.g. personalised recipe ordering, add-on recommenda@ons (upselling) etc.