Modeling Heterogeneous Treatment Effects with R

Modeling Heterogeneous Treatment Effects with R

Randomized experiments have become ubiquitous in many fields. Traditionally, we have focused on reporting the average treatment effect (ATE) from such experiments. With recent advances in machine learning, and the overall scale at which experiments are now conducted, we can broaden our analysis to include heterogeneous treatment effects. This provides a more nuanced view of the effect of a treatment or change on the outcome of interest. Going one step further, we can use models of heterogeneous treatment effects to optimally allocate treatment.In this talk will provide a brief overview of heterogeneous treatment effect modeling. We will show how to apply some recently proposed methods using R, and compare the results of each using a question wording experiment from the General Social Survey. Finally, we will conclude with some practical issues in modeling heterogeneous treatment effects, including model selection and obtaining valid confidence intervals.


Bill Lattner

July 11, 2018


  1. Modeling Heterogeneous Treatment Effects with R useR! 2018 Bill Lattner

    July 12, 2018
  2. Should we rebrand “welfare” as “assistance to the poor”? 0

  3. Welfare - United States Welfare in the United States referrs

    to an assortment of assistance programs at the Federal and State levels. • cash or wage assistance • healthcare (Medicaid) • food (SNAP) • utilities (natural gas, electricity) 1
  4. Let’s run an experiment! 1

  5. General Social Survey (GSS) The experimental data we’re looking at

    today comes from the General Social Survey. Since 1972, the General Social Survey (GSS) has provided politicians, pol- icymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as national spending priori- ties, crime and punishment, intergroup relations, and confidence in insti- tutions. 1 The survey is typically fielded every two years with a large overlap in questions between years. 1 2
  6. GSS Framing Experiment The GSS began an ongoing question framing

    experiment in 1986. natfare/natfarey We are faced with many problems in this country, none of which can be solved easily or inexpensively. I’m going to name some of these problems, and for each one I’d like you to tell me whether you think we’re spending too much money on it, too little money, or the right amount. Are we spending too much, too little, or about the right amount on [TREATMENT]. Control welfare Treatment assistance to the poor 3
  7. GSS Variables year the survey year treatment the question treatment,

    welfare or assistance response response to speding question, 1 if too much partyid party identification of respondent, from democrat to republican polviews political views of respondent, from liberal to conservative age age of respondent educ respondent years of education racial_attitude_index composite index of negative racial attitudes2 2Green and Kern, “Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees”. 4
  8. Topline 5

  9. Average Treatment Effect (ATE) The average treatment effect (ATE) tells

    us the overall effect of the treatment. The ATE is the difference in outcomes between the treatment and control groups, ATE = E[y | t = treatment] − E[y | t = control]. 6
  10. ATE - dplyr > gss %>% group_by(treatment) %>% summarize(avg =

    mean(response)) %>% spread(treatment, avg) %>% summarize(ate = assistance - welfare) # A tibble: 1 x 1 ate <dbl> 1 -0.347 7
  11. ATE - lm > lm(response ~ treatment, data = gss)

    Call: lm(formula = response ~ treatment, data = gss) Coefficients: (Intercept) treatmentassistance 0.4550 -0.3467 8
  12. Heterogeneous Treatment Effects

  13. Potential Outcomes3 The Neyman-Rubin causal model: respondent Yi (0) Yi

    (1) treatment 1 ? too much assistance 2 too little ? welfare 3 too little ? welfare Yi(0) and Yi(1) are called potential outcomes. When respontent i is treated, we observe Yi(1), when they are untreated, we observe Yi(0). 3Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies.” 9
  14. Conditional Average Treatment Effect (CATE) The average treatment effect is

    useful: it allows us to compare different treatments for overall effectiveness. But, it’s a population average. A more interesting measure is the conditional average effect (CATE), CATE(x) = E[Y(1) − Y(0) | X = x]. We can see the effect of the treatment on groups with a particular value x of the pre-treatment covariates. 10
  15. Example: CATE of Political Views 11

  16. Modeling Approaches

  17. Missing Data Problem respondent Yi (0) Yi (1) treatment age

    … educ 1 ? too much assistance 34 … 16 2 too little ? welfare 41 … 12 3 too little ? welfare 53 … 20 Let’s use machine learning to estimate the missing potential outcomes. 12
  18. Interactive Model ˆ µ = M(Y ∼ (X, T)) CATE(x)

    = ˆ µ(x, 1) − ˆ µ(x, 0) • use any ML/statistical model M(·, ·) • include the treatment indicator T • include all treatment and covariate interactions 13
  19. Interactive Model m <- randomForest(response ~ ., data = gss)

    gss_treated <- gss %>% mutate(treatment = factor("assistance", levels = c("welfare", "assistance"))) gss_control <- gss %>% mutate(treatment = factor("welfare", levels = c("welfare", "assistance"))) y_1 <- predict(m, gss_treated, type = "prob")[, 2] y_0 <- predict(m, gss_control, type = "prob")[, 2] cate <- y_1 - y_0 14
  20. Model Evaluation

  21. Evaluation Figuring out if we have a decent model is

    tough. We never observe the same people under both treatment and control, so we can’t use traditional metrics like MSE or accuracy. 15
  22. True vs Predicted CATE Quantiles • Using a holdout set

    or cross- validation, get a set of out-of-sample treatment effect scores from a given model. • Quantile those scores and calculate the true ATE within each quantile. • Check that those predictions order well and see how they compare to the average predictions in each quantile. 16
  23. Uplift • The uplift curve represents the incremental gain from

    using the model to target effort or outreach. • Similar to the quantile plot, rank observations by predicted ATE and compare to actual ATE in each group, red line. • Compare this to randomly ordering observations, blue line. 17
  24. qini4 • The qini coefficient is analogous to the area

    under the ROC curve (AUC) for supervised learning. • A single metric we can use to compare models fit to the same task. • Scale matters, so we can’t use to compare models in absolute terms. 4Radcliffe and Surry, “Real-world uplift modelling with significance-based uplift trees”. 18
  25. Modeling Approaches, Continued

  26. Split Model ˆ µ0 = M0(Y0 ∼ X0 ) ˆ

    µ1 = M1(Y1 ∼ X1 ) CATE(x) = ˆ µ1(x) − ˆ µ0(x) • use two models • M0(·) estimated with control group • M1(·) estimated with treatment group 19
  27. Split Model m0 <- randomForest(response ~ . -treatment, data =

    filter(gss, treatment == "welfare")) m1 <- randomForest(response ~ . -treatment, data = filter(gss, treatment == "assistance")) y_0 <- predict(m1, gss, type = "prob")[, 2] y_1 <- predict(m2, gss, type = "prob")[, 2] cate <- y_1 - y_0 20
  28. X-Learner5 ˆ µ0(x) = M1(Y0 ∼ X0 ) ˆ µ1(x)

    = M2(Y1 ∼ X1 ) ˜ D0 = ˆ µ1(X0 ) − Y0 ˜ D1 = Y1 − ˆ µ0(X1 ) CATE0(x) = M4(˜ D0 ∼ X0 ) CATE1(x) = M3(˜ D1 ∼ X1 ) CATE(x) = g(x)CATE0(x) + (1 − g(x))CATE1(x) • M1 and M2 estimate the response in the control and treatment groups • ˜ D1 and ˜ D0 are the imputed CATE • g(x) is a weighting function, typically the propensity score or treated fraction 5Künzel et al., “Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning”. 21
  29. Generalized Random Forest (GRF) 6 7 • CART/RandomForest inspired •

    directly estimates CATE • guarantees for consistency and bias • proper confidence intervals • CRAN: grf 6Athey, Tibshirani, and Wager, “Generalized random forests” 7D’Agostino and Lattner, The power of persuasion modeling 22
  30. GRF library(grf) x <- model.matrix(response ~ . -treatment, data =

    gss) y <- gss$response tmt <- ifelse(gss$treatment == "welfare", 0, 1) m <- causal_forest(x, y, tmt) cate <- predict(m, estimate.variance = TRUE) 23
  31. hete Package

  32. hete Package • interactive, split, and x-learner • formula interface

    • plugin any ML model/estimator • uplift curve • plots • on GitHub: 24
  33. hete Package m <- hete_single(response ~ year + educ +

    age | treatment, data = gss, est = random_forest) plot(m) cate <- predict(m, gss) 25
  34. So, should we rebrand “welfare”? 25

  35. Rebrand? Probably! 26

  36. References i Angrist, Joshua D. and Jörn-Steffen Pischke. Mostly Harmless

    Econometrics: An Empiricist’s Companion. Princeton University Press, Dec. 2008. isbn: 0691120358. Athey, Susan, Julie Tibshirani, and Stefan Wager. “Generalized random forests”. In: arXiv preprint arXiv:1610.01271 (2016). Chernozhukov, Victor et al. Double machine learning for treatment and causal parameters. Tech. rep. cemmap working paper, Centre for Microdata Methods and Practice, 2016. D’Agostino, Michelangelo and Bill Lattner. The power of persuasion modeling. Talk at Strata + Hadoop World, San Jose, CA. 2017. 27
  37. References ii Green, Donald P and Holger L Kern. “Modeling

    heterogeneous treatment effects in survey experiments with Bayesian additive regression trees”. In: Public opinion quarterly 76.3 (2012), pp. 491–511. Imbens, Guido W and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015. Künzel, Sören R et al. “Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning”. In: arXiv preprint arXiv:1706.03461 (2017). Radcliffe, Nicholas J and Patrick D Surry. “Real-world uplift modelling with significance-based uplift trees”. In: White Paper TR-2011-1, Stochastic Solutions (2011). 28
  38. References iii Rubin, Donald B. “Estimating causal effects of treatments

    in randomized and nonrandomized studies.”. In: Journal of educational Psychology 66.5 (1974), p. 688. 29
  39. Thank you! Twitter @wlattner GitHub Slides GSS Data grf Package hete Package 30