Causal inference made easy with Inverse Propensity Weighting

Causal inference made easy with Inverse Propensity Weighting Gerben Oostra,
Senior Data Scientist @ Vianai Systems IL Outcome Action © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

© V I A N A I S Y S
T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 2 Gerben Oostra Senior Data Scientist @ Vianai Systems Israel Living in Haarlem, the Netherlands Interests: Causal Inference, Contextual Bandits, ML Engineering, trail running Netherlands

Build the world’s first Decision Optimization Platform to reliably prescribe,
validate and execute actions that optimize business outcomes © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 3 Outcome Action Self service Causal inference Actual impact

https://www.linkedin.com/company/vianai/jobs/ © V I A N A I S Y
S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 4

Causal Inference © V I A N A I S
Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 5

Why Causal Inference matters Descriptive Predictive Prescriptive Statistics: What happened?
Correlations: Given X, what is Y? Causality: Which action should I do? © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Examples Predictive Prescriptive Acquisition Product engagement Churn How likely is
this prospect? Will X be a MAU? How likely will X churn? Would it help if I called? Will an email improve engagement? Who should I call to prevent churn? Keeping everything as it is Taking different actions © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

The goal of causal inference: To predict the effect of
an action, given the known context This answers question like § What if I do ..? Why? (Intervention) § What if I had …? Why? (Counterfactuals) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Inverse Propensity Weighting © V I A N A I
S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 9

The challenge in Causal Inference 𝑥 T R Result Treatment
Features Historically: Past campaigns Future: Our model Causal graph Correlation has predictive power We need causation for prescriptive power We only observe correlations Here correlation is causation Correlation is not causation Good predictive power != prescriptive poer © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Determining unbiased reward estimates Random Control Trial (RCT) Run an
experiment, with random treatment assignment = No bias + Simple & straightforward + Correlation is again causation - Expensive: discarding operational data, past & future Observational Data Use historically collected data, adjust for biases. + Using historical data + Can use future data: feedback loop - Need to know all confounders - The system in which you operate could be too deterministic 𝑥 T R 𝑥 T R © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Unbiased reward predictions using observational data Steps: 1. Include confounders
If your direct model is 100% accurate (RMSE ≅ 0 / AUC ≅ 1), you’re already fine If you have errors, they are mainly minimized for records matching current policy 𝜋! 𝛾 𝜒 2. Inverse propensity weighting (today) IPW will make ensure the errors are random across the population (generalizing predictions) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Causal inference with Inverse propensity weighting 𝑥 T R Result
Treatment Features Propensity model to learn the correlation Propensity Model Age / location / .. P(T | x) T1 T2 𝜔! = 1 𝑃 𝑇|𝑥 Age / location / .. T1 T2 𝜔P(T | x) Propensity Model 𝑃 𝑇|𝑥 → 𝑥 Weight samples inverse to propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

How to do inverse propensity weighting 1. Train propensity model
2. Predict propensity for observed treatments 3. Train outcome model, on inverse propensity weighted samples © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Test case: a biased dataset © V I A N
A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 15 𝑥 T R Result Features Treatment

Our biased dataset © V I A N A I
S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 16

Direct model (no IPW) © V I A N A
I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 17 Model: 𝑅 = 𝐶! + 𝐶"𝑋 + 𝐶#𝑇 Loss function: RMSE Effect = 𝑚 𝑋, 1 − 𝑚(𝑋, 0) = 𝐶# Single model, aka S Learner

Direct model (no IPW) © V I A N A
I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 18 The chosen model is not capable of predicting the treatment effect With RMSE loss function, we expect to get the Average Treatment Effect (ATE) Our model’s estimate Actual ATE Bias due to confounders

Making it work – Propensity Model Treatment Propensity, P(T|X) Calibration
plot (reliability curve) Mean predicted value Actual fraction of positives Propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Making it work – Calibrated propensity model Treatment Propensity, P(T|X)
Calibration plot (reliability curve) Mean predicted value Actual fraction of positives Propensity © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Propensity weighting : clipping 𝜔" = 1 𝑃 𝑇|𝑥 lim
# $|& →! 𝜔" → ∞ 1. Clip value 2. Trim dataset 𝑃$ %|' = 𝑚𝑖𝑛 0.95, 𝑚𝑎𝑥 0.05, 𝑃 𝑇|𝑥 𝜔′( = 9 𝑃 𝑇|𝑥 )", 0.05 ≤ 𝑥 ≤ 0.95 0, 𝑥 > 0.95 0, 𝑥 < 0.05 © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Making it work with IPW © V I A N
A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L 22 With the inverse propensity weighted dataset, we actually get the Average Treatment Effect (ATE)

How to do inverse propensity weighting 1. Train propensity model
2. Calibrate propensities 3. Predict propensity for observed treatments 4. Propensity clipping 5. Train outcome model, on inverse propensity weighted samples © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Some final remarks Predictive quality on outcome ≠ predictive quality
on treatment effect Because the validation & test sets also have treatment selection bias Need some randomization in data If never (or always) treated subgroups, cannot learn treatment effect Propensity model doesn’t need to be perfect It only needs to represent the treatment selection bias (confounders) © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

25 Further reading Blog posts: § Medium “Understanding Inverse Propensity
Weighting” https://medium.com/bigdatarepublic/understanding-inverse-propensity- weighting-a191d94bb2eb § Medium “Preventing Churn like a Bandit” https://medium.com/bigdatarepublic/preventing-churn-like-a-bandit- 49b7c51b4929 Books: § The book of why, Judea Pearl § Causal Inference: What If free: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ Good explanation of many algorithms: § EconML: https://econml.azurewebsites.net/ © V I A N A I S Y S T E M S , I N C . P R O P R I E T A R Y & C O N F I D E N T I A L

Causal inference made easy with Inverse Propens...

Causal inference made easy with Inverse Propensity Weighting

Gerben Oostra

More Decks by Gerben Oostra

Other Decks in Technology

Featured

Transcript

Causal inference made easy with Inverse Propensity Weighting Gerben Oostra,

© V I A N A I S Y S

Build the world’s first Decision Optimization Platform to reliably prescribe,

https://www.linkedin.com/company/vianai/jobs/ © V I A N A I S Y

Causal Inference © V I A N A I S

Why Causal Inference matters Descriptive Predictive Prescriptive Statistics: What happened?

Examples Predictive Prescriptive Acquisition Product engagement Churn How likely is

The goal of causal inference: To predict the effect of

Inverse Propensity Weighting © V I A N A I

The challenge in Causal Inference 𝑥 T R Result Treatment

Determining unbiased reward estimates Random Control Trial (RCT) Run an

Unbiased reward predictions using observational data Steps: 1. Include confounders

Causal inference with Inverse propensity weighting 𝑥 T R Result

How to do inverse propensity weighting 1. Train propensity model

Test case: a biased dataset © V I A N

Our biased dataset © V I A N A I

Direct model (no IPW) © V I A N A

Direct model (no IPW) © V I A N A

Making it work – Propensity Model Treatment Propensity, P(T|X) Calibration

Making it work – Calibrated propensity model Treatment Propensity, P(T|X)

Propensity weighting : clipping 𝜔" = 1 𝑃 𝑇|𝑥 lim

Making it work with IPW © V I A N

How to do inverse propensity weighting 1. Train propensity model

Some final remarks Predictive quality on outcome ≠ predictive quality

25 Further reading Blog posts: § Medium “Understanding Inverse Propensity

[email protected] © V I A N A I S Y