Data Science at honestbee - DSSG 2016-10-24

Dat Le @lenguyenthedat data science at honestbee 24th Oct 2016
- Data Science SG

honestbee

• What is honestbee? • Full-service online grocery + laundry
delivery company • Singapore - Hong Kong - Taiwan - Japan • Malaysia - Philippines - Indonesia - Thailand • Wide range of supermarkets and boutique stores • Referral: GIVE $20 GET $10 honestbee https:/ /honestbee.sg/r/DATL8886 Let me know!

Data Science

Predictive models • Item availability predictions • Customer life-time value
/ customer proﬁtability grading • Customer demand forecast & trending Recommendation engines • Item-based recommendations • CRM campaigns recommendations Clustering analysis, data mining • Customer Segmentation (proﬁling, 360 view, clustering) Operational optimizations • Task scheduling • Route optimization Data Science

Item Availability Prediction

What? • Item not available at the store! • We
don’t know until the bee is picking the item Why? • Customer happiness • Business proﬁtability How? • Predictive Model (Binary Classiﬁcation) • Communicate with our customers before they even make a purchase Item Availability Prediction

Features • Date of delivery (date of week, time slot)
• Product metadata (brand, name, category, price, discount) • Store metadata (store type, location) • External data (weather, public holiday, promotion periods, ﬁnancial data: STI, inﬂation rate, un-employment rate) • Ground truth (Available vs Out of Stock) Item Availability Prediction

Algorithm: XGBoost (https:/ /github.com/dmlc/xgboost) • Decision tree based Gradient Boosting
Machine • Available in Python, R, and Julia • State-of-the-art, winning algorithm for lots of Kaggle’s data science challenges: • 1st @ Crowdflower Search Results Relevance • 1st @ Microsoft Malware Classification Challenge (BIG 2015) • 1st @ Tradeshift Text Classification • 1st @ Otto Group Product Classification Item Availability Prediction

Evaluation metrics: AUC (Area Under The Curve) score • http:/
/scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html • AUC vs ACC http:/ /datascience.stackexchange.com/questions/806/advantages-of- auc-vs-standard-accuracy • Not aﬀected by highly-skewed dataset • AUC’s score range: • 0.5-0.6 (Fail) • 0.6-0.7 (Poor) • 0.7-0.8 (Fair) • 0.8 (Good) - 1.0 (Perfect) Item Availability Prediction

Item Availability Prediction Buy me! On Production: Likely Out of
Stock!

Item-based Recommendation Engine

What? • Recommendation Engine • People who bought Tortilla Chips
also bought Coca Cola Zero Why? • Better User Experience • Increase cart size How? • Collaborative Filtering • Python Pandas + Jaccard Index Item-based Recommendation Engine

Collaborative Filtering • traditional & popular technique used in recommendation
systems • input: User - Item matrix • continuous values: User Rating (from 1* to 5*, 0% to 100%) • binary values: User Behavior (Purchases / Visits / Clicks) • 2 diﬀerent methodologies: user-based and item-based recommendations Item-based Recommendation Engine https:/ /buildingrecommenders.wordpress.com/

Collaborative Filtering • user-based: “users like you usually buy these”
• works for social networks • works for “taste”-like recommendations (i.e movies, fashions, social networks) • output: User - User matrix • performance scales with number of users • user home page, emails, in-app notiﬁcation Item-based Recommendation Engine

Collaborative Filtering • item-based: “users who bought X also bought
Y” • complementary purchases (e- commerce), news suggestions • output: Item - Item matrix • performance scales with number of items • product page, cart page recommendations Item-based Recommendation Engine

Algorithm: Jaccard Index (https:/ /en.wikipedia.org/wiki/ Jaccard_index) • Set Theory •
Ratio of intersection gives similarity score • Sensitive to sparse input Item-based Recommendation Engine J v1 1 ,v2 ( )= U1 ∩ U2 U1 ∪ U2 J=2/6

Pandas: http:/ /pandas.pydata.org/ • Python • Data Analysis toolkit Item-based
Recommendation Engine

Item-based Recommendation Engine On Production (soon!): Cooking ingredients!

Item-based Recommendation Engine On Production (soon!): Baby products!

Item-based Recommendation Engine On Production (soon!): BBQ-style Parties!

Data Infrastructure

Data Infrastructure Auto Integration & Deployment https:/ /mesosphere.com/blog/2015/04/02/continuous- deployment-with-mesos-marathon-docker/

Platform: Amazon Web Services with EC2, S3, RDS Postgres, and
Redshift    Application: Docker, Airﬂow    Code Review, Test and Integration: Github + Travis CI    Resource management: Apache Mesos, AWS Autoscaling    Application & Discovery management: Apache Marathon    Languages: Python, SQL  Data Infrastructure

the end

Data Science at honestbee - DSSG 2016-10-24

Data Science at honestbee - DSSG 2016-10-24

Dat Le

More Decks by Dat Le

Other Decks in Technology

Featured

Transcript

Dat Le @lenguyenthedat data science at honestbee 24th Oct 2016

honestbee

• What is honestbee? • Full-service online grocery + laundry

Data Science

Predictive models • Item availability predictions • Customer life-time value

Item Availability Prediction

What? • Item not available at the store! • We

Features • Date of delivery (date of week, time slot)

Algorithm: XGBoost (https:/ /github.com/dmlc/xgboost) • Decision tree based Gradient Boosting

Evaluation metrics: AUC (Area Under The Curve) score • http:/

Item Availability Prediction Buy me! On Production: Likely Out of

Item-based Recommendation Engine

What? • Recommendation Engine • People who bought Tortilla Chips

Collaborative Filtering • traditional & popular technique used in recommendation

Collaborative Filtering • user-based: “users like you usually buy these”

Collaborative Filtering • item-based: “users who bought X also bought

Algorithm: Jaccard Index (https:/ /en.wikipedia.org/wiki/ Jaccard_index) • Set Theory •

Pandas: http:/ /pandas.pydata.org/ • Python • Data Analysis toolkit Item-based

Item-based Recommendation Engine On Production (soon!): Cooking ingredients!

Item-based Recommendation Engine On Production (soon!): Baby products!

Item-based Recommendation Engine On Production (soon!): BBQ-style Parties!

Data Infrastructure

Data Infrastructure

Data Infrastructure Auto Integration & Deployment https:/ /mesosphere.com/blog/2015/04/02/continuous- deployment-with-mesos-marathon-docker/

Platform: Amazon Web Services with EC2, S3, RDS Postgres, and

the end