$30 off During Our Annual Pro Sale. View Details »

Modeling Heterogeneous Treatment Effects with R

Modeling Heterogeneous Treatment Effects with R

Randomized experiments have become ubiquitous in many fields. Traditionally, we have focused on reporting the average treatment effect (ATE) from such experiments. With recent advances in machine learning, and the overall scale at which experiments are now conducted, we can broaden our analysis to include heterogeneous treatment effects. This provides a more nuanced view of the effect of a treatment or change on the outcome of interest. Going one step further, we can use models of heterogeneous treatment effects to optimally allocate treatment.In this talk will provide a brief overview of heterogeneous treatment effect modeling. We will show how to apply some recently proposed methods using R, and compare the results of each using a question wording experiment from the General Social Survey. Finally, we will conclude with some practical issues in modeling heterogeneous treatment effects, including model selection and obtaining valid confidence intervals.

Bill Lattner

July 11, 2018
Tweet

More Decks by Bill Lattner

Other Decks in Programming

Transcript

  1. Modeling Heterogeneous Treatment Effects with R
    useR! 2018
    Bill Lattner
    July 12, 2018

    View Slide

  2. Should we rebrand “welfare” as “assistance to the poor”?
    0

    View Slide

  3. Welfare - United States
    Welfare in the United States referrs to an assortment of assistance programs at
    the Federal and State levels.
    • cash or wage assistance
    • healthcare (Medicaid)
    • food (SNAP)
    • utilities (natural gas, electricity)
    1

    View Slide

  4. Let’s run an experiment!
    1

    View Slide

  5. General Social Survey (GSS)
    The experimental data we’re looking at today comes from the General Social
    Survey.
    Since 1972, the General Social Survey (GSS) has provided politicians, pol-
    icymakers, and scholars with a clear and unbiased perspective on what
    Americans think and feel about such issues as national spending priori-
    ties, crime and punishment, intergroup relations, and confidence in insti-
    tutions. 1
    The survey is typically fielded every two years with a large overlap in questions
    between years.
    1http://gss.norc.org/
    2

    View Slide

  6. GSS Framing Experiment
    The GSS began an ongoing question framing experiment in 1986.
    natfare/natfarey
    We are faced with many problems in this country, none of which can be solved
    easily or inexpensively. I’m going to name some of these problems, and for each
    one I’d like you to tell me whether you think we’re spending too much money on
    it, too little money, or the right amount.
    Are we spending too much, too little, or about the right amount on [TREATMENT].
    Control welfare
    Treatment assistance to the poor
    3

    View Slide

  7. GSS Variables
    year the survey year
    treatment the question treatment, welfare or assistance
    response response to speding question, 1 if too much
    partyid party identification of respondent, from democrat to republican
    polviews political views of respondent, from liberal to conservative
    age age of respondent
    educ respondent years of education
    racial_attitude_index composite index of negative racial attitudes2
    2Green and Kern, “Modeling heterogeneous treatment effects in survey experiments with Bayesian
    additive regression trees”.
    4

    View Slide

  8. Topline
    5

    View Slide

  9. Average Treatment Effect (ATE)
    The average treatment effect (ATE) tells us the overall effect of the treatment. The
    ATE is the difference in outcomes between the treatment and control groups,
    ATE = E[y | t = treatment] − E[y | t = control].
    6

    View Slide

  10. ATE - dplyr
    > gss %>%
    group_by(treatment) %>%
    summarize(avg = mean(response)) %>%
    spread(treatment, avg) %>%
    summarize(ate = assistance - welfare)
    # A tibble: 1 x 1
    ate

    1 -0.347
    7

    View Slide

  11. ATE - lm
    > lm(response ~ treatment, data = gss)
    Call:
    lm(formula = response ~ treatment, data = gss)
    Coefficients:
    (Intercept) treatmentassistance
    0.4550 -0.3467
    8

    View Slide

  12. Heterogeneous Treatment Effects

    View Slide

  13. Potential Outcomes3
    The Neyman-Rubin causal model:
    respondent Yi (0) Yi (1) treatment
    1 ? too much assistance
    2 too little ? welfare
    3 too little ? welfare
    Yi(0) and Yi(1) are called potential outcomes. When respontent i is treated, we
    observe Yi(1), when they are untreated, we observe Yi(0).
    3Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies.”
    9

    View Slide

  14. Conditional Average Treatment Effect (CATE)
    The average treatment effect is useful: it allows us to compare different
    treatments for overall effectiveness. But, it’s a population average. A more
    interesting measure is the conditional average effect (CATE),
    CATE(x) = E[Y(1) − Y(0) | X = x].
    We can see the effect of the treatment on groups with a particular value x of the
    pre-treatment covariates.
    10

    View Slide

  15. Example: CATE of Political Views
    11

    View Slide

  16. Modeling Approaches

    View Slide

  17. Missing Data Problem
    respondent Yi (0) Yi (1) treatment age … educ
    1 ? too much assistance 34 … 16
    2 too little ? welfare 41 … 12
    3 too little ? welfare 53 … 20
    Let’s use machine learning to estimate the missing potential outcomes.
    12

    View Slide

  18. Interactive Model
    ˆ
    µ = M(Y ∼ (X, T))
    CATE(x) = ˆ
    µ(x, 1) − ˆ
    µ(x, 0)
    • use any ML/statistical model M(·, ·)
    • include the treatment indicator T
    • include all treatment and covariate
    interactions
    13

    View Slide

  19. Interactive Model
    m <- randomForest(response ~ ., data = gss)
    gss_treated <- gss %>%
    mutate(treatment = factor("assistance",
    levels = c("welfare", "assistance")))
    gss_control <- gss %>%
    mutate(treatment = factor("welfare",
    levels = c("welfare", "assistance")))
    y_1 <- predict(m, gss_treated, type = "prob")[, 2]
    y_0 <- predict(m, gss_control, type = "prob")[, 2]
    cate <- y_1 - y_0
    14

    View Slide

  20. Model Evaluation

    View Slide

  21. Evaluation
    Figuring out if we have a decent model is tough. We never observe the same
    people under both treatment and control, so we can’t use traditional metrics like
    MSE or accuracy.
    15

    View Slide

  22. True vs Predicted CATE Quantiles
    • Using a holdout set or cross-
    validation, get a set of
    out-of-sample treatment effect
    scores from a given model.
    • Quantile those scores and calculate
    the true ATE within each quantile.
    • Check that those predictions order
    well and see how they compare to
    the average predictions in each
    quantile.
    16

    View Slide

  23. Uplift
    • The uplift curve represents the
    incremental gain from using the
    model to target effort or outreach.
    • Similar to the quantile plot, rank
    observations by predicted ATE and
    compare to actual ATE in each
    group, red line.
    • Compare this to randomly ordering
    observations, blue line.
    17

    View Slide

  24. qini4
    • The qini coefficient is analogous to
    the area under the ROC curve (AUC)
    for supervised learning.
    • A single metric we can use to
    compare models fit to the same
    task.
    • Scale matters, so we can’t use to
    compare models in absolute terms.
    4Radcliffe and Surry, “Real-world uplift modelling with significance-based uplift trees”.
    18

    View Slide

  25. Modeling Approaches, Continued

    View Slide

  26. Split Model
    ˆ
    µ0 = M0(Y0
    ∼ X0
    )
    ˆ
    µ1 = M1(Y1
    ∼ X1
    )
    CATE(x) = ˆ
    µ1(x) − ˆ
    µ0(x)
    • use two models
    • M0(·) estimated with control group
    • M1(·) estimated with treatment
    group
    19

    View Slide

  27. Split Model
    m0 <- randomForest(response ~ . -treatment,
    data = filter(gss, treatment == "welfare"))
    m1 <- randomForest(response ~ . -treatment,
    data = filter(gss, treatment == "assistance"))
    y_0 <- predict(m1, gss, type = "prob")[, 2]
    y_1 <- predict(m2, gss, type = "prob")[, 2]
    cate <- y_1 - y_0
    20

    View Slide

  28. X-Learner5
    ˆ
    µ0(x) = M1(Y0
    ∼ X0
    )
    ˆ
    µ1(x) = M2(Y1
    ∼ X1
    )
    ˜
    D0
    = ˆ
    µ1(X0
    ) − Y0
    ˜
    D1
    = Y1
    − ˆ
    µ0(X1
    )
    CATE0(x) = M4(˜
    D0
    ∼ X0
    )
    CATE1(x) = M3(˜
    D1
    ∼ X1
    )
    CATE(x) = g(x)CATE0(x) + (1 − g(x))CATE1(x)
    • M1
    and M2
    estimate the response in
    the control and treatment groups
    • ˜
    D1 and ˜
    D0 are the imputed CATE
    • g(x) is a weighting function,
    typically the propensity score or
    treated fraction
    5Künzel et al., “Meta-learners for Estimating Heterogeneous Treatment Effects using Machine
    Learning”.
    21

    View Slide

  29. Generalized Random Forest (GRF) 6
    7
    • CART/RandomForest inspired
    • directly estimates CATE
    • guarantees for consistency and bias
    • proper confidence intervals
    • CRAN: grf
    6Athey, Tibshirani, and Wager, “Generalized random forests”
    7D’Agostino and Lattner, The power of persuasion modeling
    22

    View Slide

  30. GRF
    library(grf)
    x <- model.matrix(response ~ . -treatment, data = gss)
    y <- gss$response
    tmt <- ifelse(gss$treatment == "welfare", 0, 1)
    m <- causal_forest(x, y, tmt)
    cate <- predict(m, estimate.variance = TRUE)
    23

    View Slide

  31. hete Package

    View Slide

  32. hete Package
    • interactive, split, and x-learner
    • formula interface
    • plugin any ML model/estimator
    • uplift curve
    • plots
    • on GitHub: github.com/wlattner/hete
    24

    View Slide

  33. hete Package
    m <- hete_single(response ~ year + educ + age | treatment,
    data = gss, est = random_forest)
    plot(m)
    cate <- predict(m, gss)
    25

    View Slide

  34. So, should we rebrand “welfare”?
    25

    View Slide

  35. Rebrand?
    Probably!
    26

    View Slide

  36. References i
    Angrist, Joshua D. and Jörn-Steffen Pischke. Mostly Harmless Econometrics: An
    Empiricist’s Companion. Princeton University Press, Dec. 2008. isbn: 0691120358.
    Athey, Susan, Julie Tibshirani, and Stefan Wager. “Generalized random forests”. In:
    arXiv preprint arXiv:1610.01271 (2016).
    Chernozhukov, Victor et al. Double machine learning for treatment and causal
    parameters. Tech. rep. cemmap working paper, Centre for Microdata Methods
    and Practice, 2016.
    D’Agostino, Michelangelo and Bill Lattner. The power of persuasion modeling. Talk
    at Strata + Hadoop World, San Jose, CA. 2017.
    27

    View Slide

  37. References ii
    Green, Donald P and Holger L Kern. “Modeling heterogeneous treatment effects in
    survey experiments with Bayesian additive regression trees”. In: Public opinion
    quarterly 76.3 (2012), pp. 491–511.
    Imbens, Guido W and Donald B Rubin. Causal inference in statistics, social, and
    biomedical sciences. Cambridge University Press, 2015.
    Künzel, Sören R et al. “Meta-learners for Estimating Heterogeneous Treatment
    Effects using Machine Learning”. In: arXiv preprint arXiv:1706.03461 (2017).
    Radcliffe, Nicholas J and Patrick D Surry. “Real-world uplift modelling with
    significance-based uplift trees”. In: White Paper TR-2011-1, Stochastic Solutions
    (2011).
    28

    View Slide

  38. References iii
    Rubin, Donald B. “Estimating causal effects of treatments in randomized and
    nonrandomized studies.”. In: Journal of educational Psychology 66.5 (1974),
    p. 688.
    29

    View Slide

  39. Thank you!
    Twitter @wlattner
    GitHub github.com/wlattner
    Slides goo.gl/v5ATvN
    GSS Data gss.norc.org
    grf Package github.com/swager/grf
    hete Package github.com/wlattner/hete
    30

    View Slide