Modeling Heterogeneous Treatment Effects with R

Modeling Heterogeneous Treatment Effects with R useR! 2018 Bill Lattner
July 12, 2018

Should we rebrand “welfare” as “assistance to the poor”? 0

Welfare - United States Welfare in the United States referrs
to an assortment of assistance programs at the Federal and State levels. • cash or wage assistance • healthcare (Medicaid) • food (SNAP) • utilities (natural gas, electricity) 1

Let’s run an experiment! 1

General Social Survey (GSS) The experimental data we’re looking at
today comes from the General Social Survey. Since 1972, the General Social Survey (GSS) has provided politicians, pol- icymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as national spending priori- ties, crime and punishment, intergroup relations, and conﬁdence in insti- tutions. 1 The survey is typically ﬁelded every two years with a large overlap in questions between years. 1http://gss.norc.org/ 2

GSS Framing Experiment The GSS began an ongoing question framing
experiment in 1986. natfare/natfarey We are faced with many problems in this country, none of which can be solved easily or inexpensively. I’m going to name some of these problems, and for each one I’d like you to tell me whether you think we’re spending too much money on it, too little money, or the right amount. Are we spending too much, too little, or about the right amount on [TREATMENT]. Control welfare Treatment assistance to the poor 3

GSS Variables year the survey year treatment the question treatment,
welfare or assistance response response to speding question, 1 if too much partyid party identiﬁcation of respondent, from democrat to republican polviews political views of respondent, from liberal to conservative age age of respondent educ respondent years of education racial_attitude_index composite index of negative racial attitudes2 2Green and Kern, “Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees”. 4

Topline 5

Average Treatment Effect (ATE) The average treatment effect (ATE) tells
us the overall effect of the treatment. The ATE is the difference in outcomes between the treatment and control groups, ATE = E[y | t = treatment] − E[y | t = control]. 6

ATE - dplyr > gss %>% group_by(treatment) %>% summarize(avg =
mean(response)) %>% spread(treatment, avg) %>% summarize(ate = assistance - welfare) # A tibble: 1 x 1 ate <dbl> 1 -0.347 7

ATE - lm > lm(response ~ treatment, data = gss)
Call: lm(formula = response ~ treatment, data = gss) Coefficients: (Intercept) treatmentassistance 0.4550 -0.3467 8

Heterogeneous Treatment Effects

Potential Outcomes3 The Neyman-Rubin causal model: respondent Yi (0) Yi
(1) treatment 1 ? too much assistance 2 too little ? welfare 3 too little ? welfare Yi(0) and Yi(1) are called potential outcomes. When respontent i is treated, we observe Yi(1), when they are untreated, we observe Yi(0). 3Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies.” 9

Conditional Average Treatment Effect (CATE) The average treatment effect is
useful: it allows us to compare different treatments for overall effectiveness. But, it’s a population average. A more interesting measure is the conditional average effect (CATE), CATE(x) = E[Y(1) − Y(0) | X = x]. We can see the effect of the treatment on groups with a particular value x of the pre-treatment covariates. 10

Example: CATE of Political Views 11

Modeling Approaches

Missing Data Problem respondent Yi (0) Yi (1) treatment age
… educ 1 ? too much assistance 34 … 16 2 too little ? welfare 41 … 12 3 too little ? welfare 53 … 20 Let’s use machine learning to estimate the missing potential outcomes. 12

Interactive Model ˆ µ = M(Y ∼ (X, T)) CATE(x)
= ˆ µ(x, 1) − ˆ µ(x, 0) • use any ML/statistical model M(·, ·) • include the treatment indicator T • include all treatment and covariate interactions 13

Interactive Model m <- randomForest(response ~ ., data = gss)
gss_treated <- gss %>% mutate(treatment = factor("assistance", levels = c("welfare", "assistance"))) gss_control <- gss %>% mutate(treatment = factor("welfare", levels = c("welfare", "assistance"))) y_1 <- predict(m, gss_treated, type = "prob")[, 2] y_0 <- predict(m, gss_control, type = "prob")[, 2] cate <- y_1 - y_0 14

Model Evaluation

Evaluation Figuring out if we have a decent model is
tough. We never observe the same people under both treatment and control, so we can’t use traditional metrics like MSE or accuracy. 15

True vs Predicted CATE Quantiles • Using a holdout set
or cross- validation, get a set of out-of-sample treatment effect scores from a given model. • Quantile those scores and calculate the true ATE within each quantile. • Check that those predictions order well and see how they compare to the average predictions in each quantile. 16

Uplift • The uplift curve represents the incremental gain from
using the model to target effort or outreach. • Similar to the quantile plot, rank observations by predicted ATE and compare to actual ATE in each group, red line. • Compare this to randomly ordering observations, blue line. 17

qini4 • The qini coefficient is analogous to the area
under the ROC curve (AUC) for supervised learning. • A single metric we can use to compare models fit to the same task. • Scale matters, so we can’t use to compare models in absolute terms. 4Radcliffe and Surry, “Real-world uplift modelling with significance-based uplift trees”. 18

Modeling Approaches, Continued

Split Model ˆ µ0 = M0(Y0 ∼ X0 ) ˆ
µ1 = M1(Y1 ∼ X1 ) CATE(x) = ˆ µ1(x) − ˆ µ0(x) • use two models • M0(·) estimated with control group • M1(·) estimated with treatment group 19

Split Model m0 <- randomForest(response ~ . -treatment, data =
filter(gss, treatment == "welfare")) m1 <- randomForest(response ~ . -treatment, data = filter(gss, treatment == "assistance")) y_0 <- predict(m1, gss, type = "prob")[, 2] y_1 <- predict(m2, gss, type = "prob")[, 2] cate <- y_1 - y_0 20

X-Learner5 ˆ µ0(x) = M1(Y0 ∼ X0 ) ˆ µ1(x)
= M2(Y1 ∼ X1 ) ˜ D0 = ˆ µ1(X0 ) − Y0 ˜ D1 = Y1 − ˆ µ0(X1 ) CATE0(x) = M4(˜ D0 ∼ X0 ) CATE1(x) = M3(˜ D1 ∼ X1 ) CATE(x) = g(x)CATE0(x) + (1 − g(x))CATE1(x) • M1 and M2 estimate the response in the control and treatment groups • ˜ D1 and ˜ D0 are the imputed CATE • g(x) is a weighting function, typically the propensity score or treated fraction 5Künzel et al., “Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning”. 21

Generalized Random Forest (GRF) 6 7 • CART/RandomForest inspired •
directly estimates CATE • guarantees for consistency and bias • proper conﬁdence intervals • CRAN: grf 6Athey, Tibshirani, and Wager, “Generalized random forests” 7D’Agostino and Lattner, The power of persuasion modeling 22

GRF library(grf) x <- model.matrix(response ~ . -treatment, data =
gss) y <- gss$response tmt <- ifelse(gss$treatment == "welfare", 0, 1) m <- causal_forest(x, y, tmt) cate <- predict(m, estimate.variance = TRUE) 23

hete Package

hete Package • interactive, split, and x-learner • formula interface
• plugin any ML model/estimator • uplift curve • plots • on GitHub: github.com/wlattner/hete 24

hete Package m <- hete_single(response ~ year + educ +
age | treatment, data = gss, est = random_forest) plot(m) cate <- predict(m, gss) 25

So, should we rebrand “welfare”? 25

Rebrand? Probably! 26

References i Angrist, Joshua D. and Jörn-Steffen Pischke. Mostly Harmless
Econometrics: An Empiricist’s Companion. Princeton University Press, Dec. 2008. isbn: 0691120358. Athey, Susan, Julie Tibshirani, and Stefan Wager. “Generalized random forests”. In: arXiv preprint arXiv:1610.01271 (2016). Chernozhukov, Victor et al. Double machine learning for treatment and causal parameters. Tech. rep. cemmap working paper, Centre for Microdata Methods and Practice, 2016. D’Agostino, Michelangelo and Bill Lattner. The power of persuasion modeling. Talk at Strata + Hadoop World, San Jose, CA. 2017. 27

References ii Green, Donald P and Holger L Kern. “Modeling
heterogeneous treatment effects in survey experiments with Bayesian additive regression trees”. In: Public opinion quarterly 76.3 (2012), pp. 491–511. Imbens, Guido W and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015. Künzel, Sören R et al. “Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning”. In: arXiv preprint arXiv:1706.03461 (2017). Radcliffe, Nicholas J and Patrick D Surry. “Real-world uplift modelling with signiﬁcance-based uplift trees”. In: White Paper TR-2011-1, Stochastic Solutions (2011). 28

References iii Rubin, Donald B. “Estimating causal effects of treatments
in randomized and nonrandomized studies.”. In: Journal of educational Psychology 66.5 (1974), p. 688. 29

Thank you! Twitter @wlattner GitHub github.com/wlattner Slides goo.gl/v5ATvN GSS Data
gss.norc.org grf Package github.com/swager/grf hete Package github.com/wlattner/hete 30

Modeling Heterogeneous Treatment Effects with R

Modeling Heterogeneous Treatment Effects with R

More Decks by Bill Lattner

Other Decks in Programming

Featured

Transcript