Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal inference made easy with Inverse Propens...

Gerben Oostra
November 02, 2021

Causal inference made easy with Inverse Propensity Weighting

Machine learning is often used for predictive modelling, which predicts how a certain system will behave. But what we actually want to do, is to improve a system: for example by choosing which people to call, which discounts to give, or which products to recommend. This is not predictive, but prescriptive modelling. Using causal inference techniques one can predict how the system will behave when we change it, such that we can chose the best action. One causal inference technique is "Inverse Propensity Weighting". I'll explain how it works and why it makes sense.

Gerben Oostra

November 02, 2021
Tweet

More Decks by Gerben Oostra

Other Decks in Technology

Transcript

  1. Causal inference made easy with Inverse Propensity Weighting Gerben Oostra,

    Senior Data Scientist @ Vianai Systems IL Outcome Action © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  2. © V I A N A I S Y S

    T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 2 Gerben Oostra Senior Data Scientist @ Vianai Systems Israel Living in Haarlem, the Netherlands Interests: Causal Inference, Contextual Bandits, ML Engineering, trail running Netherlands
  3. Build the world’s first Decision Optimization Platform to reliably prescribe,

    validate and execute actions that optimize business outcomes © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 3 Outcome Action Self service Causal inference Actual impact
  4. https://www.linkedin.com/company/vianai/jobs/ © V I A N A I S Y

    S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 4
  5. Causal Inference © V I A N A I S

    Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 5
  6. Why Causal Inference matters Descriptive Predictive Prescriptive Statistics: What happened?

    Correlations: Given X, what is Y? Causality: Which action should I do? © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  7. Examples Predictive Prescriptive Acquisition Product engagement Churn How likely is

    this prospect? Will X be a MAU? How likely will X churn? Would it help if I called? Will an email improve engagement? Who should I call to prevent churn? Keeping everything as it is Taking different actions © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  8. The goal of causal inference: To predict the effect of

    an action, given the known context This answers question like § What if I do ..? Why? (Intervention) § What if I had …? Why? (Counterfactuals) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  9. Inverse Propensity Weighting © V I A N A I

    S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 9
  10. The challenge in Causal Inference 𝑥 T R Result Treatment

    Features Historically: Past campaigns Future: Our model Causal graph Correlation has predictive power We need causation for prescriptive power We only observe correlations Here correlation is causation Correlation is not causation Good predictive power != prescriptive poer © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  11. Determining unbiased reward estimates Random Control Trial (RCT) Run an

    experiment, with random treatment assignment = No bias + Simple & straightforward + Correlation is again causation - Expensive: discarding operational data, past & future Observational Data Use historically collected data, adjust for biases. + Using historical data + Can use future data: feedback loop - Need to know all confounders - The system in which you operate could be too deterministic 𝑥 T R 𝑥 T R © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  12. Unbiased reward predictions using observational data Steps: 1. Include confounders

    If your direct model is 100% accurate (RMSE ≅ 0 / AUC ≅ 1), you’re already fine If you have errors, they are mainly minimized for records matching current policy 𝜋! 𝛾 𝜒 2. Inverse propensity weighting (today) IPW will make ensure the errors are random across the population (generalizing predictions) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  13. Causal inference with Inverse propensity weighting 𝑥 T R Result

    Treatment Features Propensity model to learn the correlation Propensity Model Age / location / .. P(T | x) T1 T2 𝜔! = 1 𝑃 𝑇|𝑥 Age / location / .. T1 T2 𝜔P(T | x) Propensity Model 𝑃 𝑇|𝑥 → 𝑥 Weight samples inverse to propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  14. How to do inverse propensity weighting 1. Train propensity model

    2. Predict propensity for observed treatments 3. Train outcome model, on inverse propensity weighted samples © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  15. Test case: a biased dataset © V I A N

    A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 15 𝑥 T R Result Features Treatment
  16. Our biased dataset © V I A N A I

    S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 16
  17. Direct model (no IPW) © V I A N A

    I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 17 Model: 𝑅 = 𝐶! + 𝐶"𝑋 + 𝐶#𝑇 Loss function: RMSE Effect = 𝑚 𝑋, 1 − 𝑚(𝑋, 0) = 𝐶# Single model, aka S Learner
  18. Direct model (no IPW) © V I A N A

    I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 18 The chosen model is not capable of predicting the treatment effect With RMSE loss function, we expect to get the Average Treatment Effect (ATE) Our model’s estimate Actual ATE Bias due to confounders
  19. Making it work – Propensity Model Treatment Propensity, P(T|X) Calibration

    plot (reliability curve) Mean predicted value Actual fraction of positives Propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  20. Making it work – Calibrated propensity model Treatment Propensity, P(T|X)

    Calibration plot (reliability curve) Mean predicted value Actual fraction of positives Propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  21. Propensity weighting : clipping 𝜔" = 1 𝑃 𝑇|𝑥 lim

    # $|& →! 𝜔" → ∞ 1. Clip value 2. Trim dataset 𝑃$ %|' = 𝑚𝑖𝑛 0.95, 𝑚𝑎𝑥 0.05, 𝑃 𝑇|𝑥 𝜔′( = 9 𝑃 𝑇|𝑥 )", 0.05 ≤ 𝑥 ≤ 0.95 0, 𝑥 > 0.95 0, 𝑥 < 0.05 © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  22. Making it work with IPW © V I A N

    A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 22 With the inverse propensity weighted dataset, we actually get the Average Treatment Effect (ATE)
  23. How to do inverse propensity weighting 1. Train propensity model

    2. Calibrate propensities 3. Predict propensity for observed treatments 4. Propensity clipping 5. Train outcome model, on inverse propensity weighted samples © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  24. Some final remarks Predictive quality on outcome ≠ predictive quality

    on treatment effect Because the validation & test sets also have treatment selection bias Need some randomization in data If never (or always) treated subgroups, cannot learn treatment effect Propensity model doesn’t need to be perfect It only needs to represent the treatment selection bias (confounders) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  25. 25 Further reading Blog posts: § Medium “Understanding Inverse Propensity

    Weighting” https://medium.com/bigdatarepublic/understanding-inverse-propensity- weighting-a191d94bb2eb § Medium “Preventing Churn like a Bandit” https://medium.com/bigdatarepublic/preventing-churn-like-a-bandit- 49b7c51b4929 Books: § The book of why, Judea Pearl § Causal Inference: What If free: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Good explanation of many algorithms: § EconML: https://econml.azurewebsites.net/ © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L
  26. [email protected] © V I A N A I S Y

    S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 26