Effectively encoding categorical columns ◦ Clustering and extracting features from what the trees say have highest feature importance • Dealing with missing variables ◦ KNN impute • High memory usage: Large number of rows = long processing time ◦ Used SQL for queries
bootstrapped significantly: bagging principle ◦ Fast iteration: we didn’t have a model that went through the entire dataset ◦ All models were bootstrapped instances that were averaged out ◦ fast.ai to ensure trees weren’t correlated to maximize the feature insights acquired • Overfitting ◦ L1 regularization does the trick
of the week that promos come out have relatively significant effect purchases and campaign came in • There are shoppers that shop heavily on specific days • Customers buy the same way • Brand type released on certain dates (start of week, seasonality) • Coupons that can be used for many items are redeemed more often
Clustering: Looking at Price Sensitivities across Brands and Categories • Item code - Categories - Brand: Mapping and Analysis • Matrix Factorization for Coupon Recommendation • Collect data on profit per item (sales =/= profit) to effectively measure profitability of the marketing campaign