Data Search for patterns Propose a story (model) of how the system works Validate the story (macro) Tell the story Story is used to predict outcome of decisions Agent Based Modeling Dramatic re-enactment using Data Search for patterns Propose a story (model) of how the elements of the system work Validate the story (micro & macro) Re-enact the story (simulation) Story is used to observe impact propagation and predict outcome of decisions
for: Recommendation system given a product category, which actual SKU from that category should we recommend during a checkout process Product design predicting performance in the market: i.e. will customers choose it from other products in the same category optimization: what needs to be changed in order to perform well
SKUs Number of Transactions Fresh Juice 79 17,000 Mayo 96 200,000 Body Wash 175 50,000 Peanut Butter 133 185,000 Salad Dressing 450 200,000 One year of data, from one large supermarket
consistency pack size Pricing and Promotions Price Promotions Social Network effects / interaction location gender ethnicity household income Ideal Point Distance - extract using half of the data - need to be updated periodically
200,000 transactions Social Net has smallest weight, but eliminating it degrades the model significantly Wtrembling_hand = 0 MaxUtil 35% Random 2%
action. This should be accounted for somehow. Adding is_last_choice, and frequency do NOT improve the existing model. ui = w1 *IdealPointDistancei + w2 *pricei + w3 *promotionsi ! ! ! + w4 *social_neti + w5 *is_last_choice + w5*frequency !
each choice spikes when it is made, then it decreases in time unless it is being reinforced by the customer making the same choice again ui = w1 *IdealPointDistancei + w2 *pricei + w3 *promotionsi ! ! ! + w4 *social_neti + w5 *memory_value ! ! ! | MaxArg(ui ) , if rational choice Predicted Choice = | | Random(i) , if trembling hand !
better than original model, especially for categories with lots of choices Makes Ideal Points redundant, resulting in a completely On-Line model (no need for re-training) Reduces the influence of price an promotions. Category SKUs Transactions Fresh Juice 79 17,000 Mayo 96 200,000 Body Wash 175 50,000 Peanut Butter 80 185,000 Salad Dressing 450 200,000 Max Util Habitual Juice 0.18 0.40 MayoNew 0.33 0.49 BodyWash 0.08 0.30 PeanutButter 0.20 0.44 SaladDressing 0.04 0.25 0.00 0.25 0.50 0.75 1.00
10000 15000 20000 25000 0 10 20 30 40 50 60 70 80 90 100 customers %switches per customer • 30% of the customer behaved according to the model more than 90% of the time • 30% of the customers did not behave according to the model more than 90% of the time • Not all customers have made same number of transactions. • Model is not perfect but seems to accurately capture the behavior of a significant part of the customers
30% of the customers, but only partially the behavior of the others What makes these 30% different than the others? How is their story different? Can it be figured out with the data that we have? What are the triggers and attributes that make people behave in a non-habitual manner?