Causal Inference 2022 Week 1

CAUSAL INFERENCE Mechanisms, counterfactuals, and graphs Will Lowe Data Science
Lab, Hertie School 2022-02-09

PLAN 1 Causal explanation Potential outcomes, equations, and graphs E
ects and how to estimate them Graphs and interventions Conditioning for fun and pro t “When we understand that slide, we’ll have won the war” Gen. Stanley McChrystal ( )

EXPLANATION 2 Q: Why? A: Because! Causal inference is the
main form of explanation in social science For our purposes, we’ll care about causation when we care about → understanding how institutions work → evaluating policy impact → de ning and evaluating fairness / discrimination

EXPLANATION 2 Q: Why? A: Because! Causal inference is the
main form of explanation in social science For our purposes, we’ll care about causation when we care about → understanding how institutions work → evaluating policy impact → de ning and evaluating fairness / discrimination - You may invoke my principles or the law to explain my actions, but if those are not also causes of my action, there’s no explanation. , ! To measure democracy, social trust, or political ideology we build measurement models linking items to construct. In the best models, construct causes items

CAUSAL EXPLANATION 3 Causal explanation is contrastive (Sober, ) →
Reporter: Why do you rob banks? → Willy Sutton: Because that’s where the money is. (Alas, apocryphal: Sutton & Linn, )

Reporter: Why do you rob banks? → Willy Sutton: Because that’s where the money is. (Alas, apocryphal: Sutton & Linn, ) ree possible causal questions here: → Why banks rather than post o ces? → Why robbing rather than working? → Why do it yourself rather than hiring a gang? Each one invokes a di erent contrast and a di erent causal question

Reporter: Why do you rob banks? → Willy Sutton: Because that’s where the money is. (Alas, apocryphal: Sutton & Linn, ) Sutton himself had a di erent view “Why did I rob banks? Be- cause I enjoyed it. I loved it. I was more alive when I was in- side a bank, robbing it, than at any other time in my life.”

CAUSES TO EFFECTS 5 Reasoning from e ects to causes
is hard → Many possible contrasts and mechanisms We’ll approach things from the other direction: from causes to e ects

is hard → Many possible contrasts and mechanisms We’ll approach things from the other direction: from causes to e ects What is the e ect of T ( : press, : don’t press) on Y (bang, no bang)

is hard → Many possible contrasts and mechanisms We’ll approach things from the other direction: from causes to e ects What is the e ect of T ( : press, : don’t press) on Y (bang, no bang) T (‘ ’) → Are T and Y (cor)related? → Would setting T to lead to Y= ? → Y = , but would Y = had T not been ?

HOW TO THINK FORMALLY ABOUT CAUSATION 6 We’re going to
use three closely related formal frameworks for thinking systematically about causation . Structural equations . Directed acyclic graphs (DAGs) . Potential outcomes Structural equations imply facts about → What happens when you intervene on variables (dismember graphs) → e size and direction of e ects (di erences of averages of actual and counterfactual outcomes) ese correspond to a focus on . Nature: Mechanisms. Rubin shorthand: ‘the Science’ . Nature’s joints: How variables relate statistically when generated by these mechanisms . Nature’s creatures: How actual and possible cases or realizations relate to one another

POTENTIAL OUTCOMES, EQUATIONS, GRAPHS 7 Does watching Sesame Street improve
children’s reading? (Murphy, ) An ‘encouragement design’ → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade → Likes children’s TV, SES, etc. (unmeasured) We’re interested in the e ect of watching W on letter recognition score L “Big deal. Hey, let’s get this over with. I never watch television anyway. It’s too trashy, even for me.” (Bamberger )

POTENTIAL OUTCOMES 8 Does watching Sesame Street improve reading? (Murphy,
) → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... We’re interested in the e ect of watching W on letter recognition score L P Child i’s letter score → a er watching: L(W= ) i → a er not watching: L(W= ) i We’ll usually shorten this to L( ) i and L( ) i

) → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... We’re interested in the e ect of watching W on letter recognition score L P Child i’s letter score → a er watching: L(W= ) i → a er not watching: L(W= ) i We’ll usually shorten this to L( ) i and L( ) i O Li = I[Wi = ] L( ) i + I[Wi = ] L( ) i

) → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... We’re interested in the e ect of watching W on letter recognition score L P Child i’s letter score → a er watching: L(W= ) i → a er not watching: L(W= ) i We’ll usually shorten this to L( ) i and L( ) i O Li = I[Wi = ] L( ) i + I[Wi = ] L( ) i I ∆Li = L( ) i − L( ) i Alas, only one of these is ever observed (Holland, )

EFFECTS 9 A In N children the average treatment e
ect of watching is ATE = N N i ∆Li ⇒ E[∆L] = E[L( ) − L( )] = E[L( )] − E[L( )]

ect of watching is ATE = N N i ∆Li ⇒ E[∆L] = E[L( ) − L( )] = E[L( )] − E[L( )] A ATT = E[L( ) − L( ) W = ] = E[L( ) W = ] − E[L( ) W = ] e average treatment e ect on the treated, i.e. children who watch

ect of watching is ATE = N N i ∆Li ⇒ E[∆L] = E[L( ) − L( )] = E[L( )] − E[L( )] A ATT = E[L( ) − L( ) W = ] = E[L( ) W = ] − E[L( ) W = ] e average treatment e ect on the treated, i.e. children who watch A What you’d expect watching would do to the reading scores of a randomly selected child A → ‘Make’ every child watch, record score → ‘Make’ every child not watch, record score → Subtract and average

EFFECTS 10 Treatment e ects are local tangent approximations to
policy responses under a complete model of media-education e ects → Assuming no spillover, multiple versions of treatment, or general equilibrium e ects

EFFECTS 10 Treatment e ects are local tangent approximations to
policy responses under a complete model of media-education e ects → Assuming no spillover, multiple versions of treatment, or general equilibrium e ects O en summarized as the ‘single unit treatment variable assumption’ (SUTVA, Rubin, ) SUTVA is simply the a priori assumption that the value of Y for unit u when exposed to treatment t will be the same no matter what mechanism is used to assign treatment t to unit u and no matter what treatments the other units receive. E ects are more credibly predictive when → more local → closer to the status quo

ESTIMATING EFFECTS 11 Suppose we know that the P(W =
) = π. en ATE = π E[L( ) W = ] + ( − π)E[L( ) W = ] − π E[L( ) W = ] + ( − π)E[L( ) W = ] ¹Looking at you, machine learning

) = π. en ATE = π E[L( ) W = ] + ( − π)E[L( ) W = ] − π E[L( ) W = ] + ( − π)E[L( ) W = ] Maybe you want to estimate this estimand using ˆ δ = EN [L W = ] − EN [L W = ] → E[L W = ] − E[L W = ] = δ ¹Looking at you, machine learning

) = π. en ATE = π E[L( ) W = ] + ( − π)E[L( ) W = ] − π E[L( ) W = ] + ( − π)E[L( ) W = ] Maybe you want to estimate this estimand using ˆ δ = EN [L W = ] − EN [L W = ] → E[L W = ] − E[L W = ] = δ Alas, δ is not usually the ATE (so making ˆ δ really good won’t help)¹ δ = ATE + (the right answer) (E[L( ) W = ] − E[L( ) W = ]) + (baseline group di erences) ( − π)(E[∆L W = ] − E[∆L W = ]) (group-speci c treatment e ects) ¹Looking at you, machine learning

‘NO CAUSES IN, NO CAUSES OUT’ 12 T Treat potential
outcomes as random variables and assume (or make) this true (L( ) , L( )) ⊥ ⊥ W

outcomes as random variables and assume (or make) this true (L( ) , L( )) ⊥ ⊥ W I No systematic relationship between being assigned to watch Sesame Street W and being a better or worse letter recognizer L(⋅) . I Treatment e ects ∆Li = L( ) i − L( ) i are not systematically bigger (smaller) for those who watch and those who don’t.

outcomes as random variables and assume (or make) this true (L( ) , L( )) ⊥ ⊥ W I No systematic relationship between being assigned to watch Sesame Street W and being a better or worse letter recognizer L(⋅) . I Treatment e ects ∆Li = L( ) i − L( ) i are not systematically bigger (smaller) for those who watch and those who don’t. H ? If (L( ) , L( )) ⊥ ⊥ W then E[L( ) i Wi = ] = E[L( ) i Wi = ] E[L( ) i Wi = ] = E[L( ) i Wi = ] M An experiment with W randomized

outcomes as random variables and assume (or make) this true (L( ) , L( )) ⊥ ⊥ W I No systematic relationship between being assigned to watch Sesame Street W and being a better or worse letter recognizer L(⋅) . I Treatment e ects ∆Li = L( ) i − L( ) i are not systematically bigger (smaller) for those who watch and those who don’t. H ? If (L( ) , L( )) ⊥ ⊥ W then E[L( ) i Wi = ] = E[L( ) i Wi = ] E[L( ) i Wi = ] = E[L( ) i Wi = ] M An experiment with W randomized M If we can’t assume (or make) that true, maybe (L( ) , L( )) ⊥ ⊥ W G An experiment with W randomized and G controlled e.g. an RCT

EQUATIONS AND GRAPHS 13 Does encouraging children to watch Sesame
Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T Directed Acyclic Graph Directed Acyclic Graf

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T S Nodes are variables: lled if observed, hollow if unobserved / latent

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T S Nodes are variables: lled if observed, hollow if unobserved / latent Each arrow represents a distinct mechanism underlying a (direct) causal e ect

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T S Nodes are variables: lled if observed, hollow if unobserved / latent Each arrow represents a distinct mechanism underlying a (direct) causal e ect e functional forms are unspeci ed → Regression modeling is one such story: E[L] = f (W, T, G, P) ‘Noise’ (external variables) a ect each node (not drawn) → L = E[L] + єL

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T S Nodes are variables: lled if observed, hollow if unobserved / latent Each arrow represents a distinct mechanism underlying a (direct) causal e ect e functional forms are unspeci ed → Regression modeling is one such story: E[L] = f (W, T, G, P) ‘Noise’ (external variables) a ect each node (not drawn) → L = E[L] + єL is induces a joint probability distribution with speci c conditional independencies

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P Each mechanism (with its attendant independent noise) corresponds to a probability distribution

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P Each mechanism (with its attendant independent noise) corresponds to a probability distribution P(L, P, W, G, T, E) =P(E) × P(T) × P(G) × P(P) × P(W E, T, G) × P(L P, T, G, W)

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P Each mechanism (with its attendant independent noise) corresponds to a probability distribution P(L, P, W, G, T, E) =P(E) × P(T) × P(G) × P(P) × P(W E, T, G) × P(L P, T, G, W) e graph implies independencies which are observable, e.g. E ⊥ ⊥ L W, T, G

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P Each mechanism (with its attendant independent noise) corresponds to a probability distribution P(L, P, W, G, T, E) =P(E) × P(T) × P(G) × P(P) × P(W E, T, G) × P(L P, T, G, W) e graph implies independencies which are observable, e.g. E ⊥ ⊥ L W, T, G When we intervene on this system we set some variables → Exit: W ∼ P(W E, T, G) → Enter: W =

GRAPHS AND INTERVENTIONS 16 Does encouraging children to watch Sesame
Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T D e causal e ect of W is what happens to L if you remove W’s inbound connections and change its value → intervention e.g. randomizing W makes a new graph → in this graph (L( ) , L( )) ⊥ ⊥ W E W L P G T

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T D e causal e ect of W is what happens to L if you remove W’s inbound connections and change its value → intervention e.g. randomizing W makes a new graph → in this graph (L( ) , L( )) ⊥ ⊥ W E W L P G T O Learn about the graph above with data from the graph le

TERMINOLOGY: IDENTIFICATION 17 We have an estimand, e.g. E[Y do(X
= x)] a.k.a. E[Y(X=x)] which implicitly depends on some distribution P(Y do(X = x)) a.k.a. P(Y(X=x)) And we have data that tells us about conditional probabilities and independencies and correlations, e.g. P(Y X) and regression functions E[Y X] and all that good stu Rather o en P(Y(X=x)) ≠ P(Y X) But we identify (assert identity between) an estimand (le ) with some function of observables (right). e plan for guring out what to put on the right is an identi cation strategy

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P → e e ect of W on L is W → L → T and G confound the W − L relationship → If T were observed then (L( ) , L( )) ⊥ ⊥ W T, G → Since T is not observed this is false → E is a potential instrument → Conditioning on P may increase precision → Conditioning on P will not o set the consequences of unmeasured T None of this depends on functional details, e.g. interactions, additivity, non-linearity, etc....

Street improve reading? → Encouraged: { , } → Watched: { , } → Letter recognition score: [ , ] → Previous score: [ , ] → Grade, Likes children’s TV... E W L P G T P → e e ect of W on L is W → L → T and G confound the W − L relationship → If T were observed then (L( ) , L( )) ⊥ ⊥ W T, G → Since T is not observed this is false → E is a potential instrument → Conditioning on P may increase precision → Conditioning on P will not o set the consequences of unmeasured T None of this depends on functional details, e.g. interactions, additivity, non-linearity, etc.... How can we tell all that? Let’s nd out!

ESSENTIAL STRUCTURES 19 We’ll think of graphs as compositions of
the following three types of structures X Z Y mediator X Z Y common cause, fork, confounder X Z Y collider, common e ect

OBSERVABLE IMPLICATIONS 20 X Z Y P P(X, Y, Z)
= P(Z X, Y)P(X Y)P(Y) = P(Z X, Y)P(X)P(Y)

OBSERVABLE IMPLICATIONS 20 X Z Y P P(X, Y, Z)
= P(Z X, Y)P(X Y)P(Y) = P(Z X, Y)P(X)P(Y) T e second line says P(X, Y) = P(X)P(Y) i.e. Y ⊥ ⊥ X X Z Y P P(Y, X, Z) = P(X Z)P(Y Z)P(Z) T X and Y are not independent X ⊥ ⊥ Y X ⊥ ⊥ Y Z unless we condition on Z

OBSERVABLE IMPLICATIONS 21

KINDA? 22 X Z Y P P(X, Y, Z) =
P(Y X, Z)P(Z X)P(X) P(X, Y, Z) = P(Y Z)P(Z X)P(X) T ? X and Y are not independent unless we condition on Z X ⊥ ⊥ Y X ⊥ ⊥ Y Z

KINDA? 22 X Z Y P P(X, Y, Z) =
P(Y X, Z)P(Z X)P(X) P(X, Y, Z) = P(Y Z)P(Z X)P(X) T ? X and Y are not independent unless we condition on Z X ⊥ ⊥ Y X ⊥ ⊥ Y Z From data alone we cannot distinguish Y ← Z ← X Y ← Z → X [sad trombone] M Two graphs are Markov equivalent when they they have the same skeleton (same variables and links) and the same collider structures (Pearl & Verma, ) e rest is (o en) algorithmically discoverable (see ch. Shalizi, , for an introduction)

CONDITIONING 23 Conditioning on Z: → Subsetting a data set
by values of X → Reducing a data set to cases with a particular X value → Blocking an experiment by X → Adding X to a regression or ML model as a predictor

by values of X → Reducing a data set to cases with a particular X value → Blocking an experiment by X → Adding X to a regression or ML model as a predictor Conditioning can → make association where there wasn’t any (mediators and forks) → remove association where there was (colliders and their children) X Z Y Common cause X Z Y Common e ect / collider

by values of X → Filtering a data set to cases with a particular X value → Blocking an experiment on X → Strati cation: Dividing up cases by X value, measuring a relationship, and averaging the results → Regression: Adding X as a predictor a.k.a. ‘controlling for’ Conditioning can → make association where there wasn’t any (mediators and forks) → remove association where there was (colliders and their children) X Z Y Common cause X Z Y Common e ect / collider

CONDITIONING 25 8 10 12 14 16 7.5 10.0 12.5
15.0 Mathematics Reading/Writing Private tutor No Yes −3 −2 −1 0 1 2 3 −2 0 2 Mathematics Reading/Writing Admission No Yes

CONFOUNDING 26 X Z Y P(X, Y, Z) = P(Y
Z, X)P(X Z)P(Z)

Z, X)P(X Z)P(Z) So the distribution of Y given X is (by defn.) P(Y X) = z P(Y Z =z, X)P(X Z =z)P(Z =z) which is not P(Y do(X = x)) a.k.a. P(Y(X=x))

Z, X)P(X Z)P(Z) So the distribution of Y given X is (by defn.) P(Y X) = z P(Y Z =z, X)P(X Z =z)P(Z =z) which is not P(Y do(X = x)) a.k.a. P(Y(X=x)) X Z Y P(X, Y, Z) = P(Y Z, X)P(Z)P(X)

Z, X)P(X Z)P(Z) So the distribution of Y given X is (by defn.) P(Y X) = z P(Y Z =z, X)P(X Z =z)P(Z =z) which is not P(Y do(X = x)) a.k.a. P(Y(X=x)) X Z Y P(X, Y, Z) = P(Y Z, X)P(Z)P(X) So the distribution of Y given X is P(Y X) = z P(Y Z = z, X)P(Z = z) (maybe regress Y against X controlling for Z?) at’s what we want...

COLLIDER BIAS 27 X Z Y P(X, Y, Z) =
P(Y X)P(Z X, Y)P(X) So the distribution of Y given X is right there which is also P(Y do(X = x)) a.k.a. P(Y(X=x))

COLLIDER BIAS 27 X Z Y P(X, Y, Z) =
P(Y X)P(Z X, Y)P(X) So the distribution of Y given X is right there which is also P(Y do(X = x)) a.k.a. P(Y(X=x)) X Z Y A er ‘intervention’ everything is the same... If we do decide to P(Y X) = z P(Y Z = z, X)P(Z = z) (maybe regress Y against X controlling for Z?) en we get uninterpretable mush

MUSH 28 is is uninterpretable P Graph structure tells us
(Pearl et al., ) → What should be conditioned on and what should not: D-separation → What causal e ects can be identi ed with only graph information and what needs more information → What causal e ects can be identi ed by conditioning and which not Extremely general theory for arbitrarily complicated DAGs

POTENTIAL OUTCOMES AGAIN 29 X Z Y I (Y( )
, Y( )) ⊥ ⊥ X but (Y( ) , Y( )) ⊥ ⊥ X Z so condition on Z! X Z Y I (Y( ) , Y( )) ⊥ ⊥ X but (Y( ) , Y( )) ⊥ ⊥ X Z so don’t condition on Z! “Who knows what to condition on? e graph knows”

WRAPPING UP 30

REFERENCES 31 Angrist, J. D., & Pischke, J.-S. ( ).
“ e credibility revolution in empirical economics: How better research design is taking the con out of econometrics.” Journal of Economic Perspectives, ( ), – . Haavelmo, T. ( ). “ e Statistical Implications of a System of Simultaneous Equations.” Econometrica, ( ), . Holland, P. W. ( ). “Statistics and causal inference.” Journal of the American Statistical Association, ( ), – . Imbens, G. W. ( , March ). Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics (arXiv No. . ). Leamer, E. E. ( ). “Let’s take the con out of econometrics.” e American Economic Review, ( ), – . Murphy, R. T. ( ). Educational e ectiveness of sesame street: A review of the rst twenty years of research – (Report RR- - ). Education Testing Service. Neapolitan, R. E. ( ). “Probabilistic reasoning in expert systems: eory and algorithms.” Wiley. Pearl, J. ( ). “Probabilistic reasoning in intelligent systems: Networks of plausible inference.” Kaufmann.

REFERENCES 32 Pearl, J. ( ). “Causal inference without counterfactuals:
Comment.” Journal of the American Statistical Association, ( ), – . Pearl, J., Glymour, M., & Jewell, N. P. ( ). “Causal inference in statistics: A primer.” Wiley. Pearl, J., & Verma, T. ( ). Equivalence and synthesis of causal models. In B. D’Ambrosio & P. Smets (Eds.), UAI ’ : Proceedings of the seventh annual conference on uncertainty in arti cial intelligence. Morgan Kaufmann. Rubin, D. ( ). “Which ifs have causal answers (Comment on ‘Statistics and causal inference’ by Paul W. Holland).” Journal of the American Statistical Association, , – . Sesame Street - Elmo’s Sing-Along Guessing Game and Elmocize. ( , October ). Shalizi, C. R. ( ). Advanced Data Analysis from an Elementary Point of View. Sober, E. ( ). “A theory of contrastive causal explanation and its implications concerning the explanatoriness of deterministic and probabilistic hypotheses.” European Journal for Philosophy of Science, ( ), . Strotz, R. H., & Wold, H. O. A. ( ). “Recursive vs. nonrecursive systems: An attempt at synthesis (Part I of a triptych on causal chain systems).” Econometrica, ( ), . Sutton, W., & Linn, E. ( ). “Where the money was.” Broadway Books.

REFERENCES 33 Uhler, C., Raskutti, G., Bühlmann, P., & Yu,
B. ( ). “Geometry of the faithfulness assumption in causal inference.” e Annals of Statistics, ( ), – . Wright, S. ( ). “ e method of path coe cients.” e Annals of Mathematical Statistics, ( ), – .

HISTORY: GRAPHS AND EQUATIONS 34 Biology, Statistics → Wright (
) introduced path diagrams for genetics, and ‘Wright’s Rules’ Economics → Started early on graphs (Haavelmo, ; Strotz & Wold, ) → had a ‘credibility revolution’ (Angrist & Pischke, ; Leamer, ) → now leans strongly towards potential outcomes (Imbens, ) Psychology, Sociology → Structural Equation Modeling (SEM) built a er Cowles Commission ( ) → In the s by Jöreskog at ETS and Wold at Uppsala University. → Still kind of tense about causal inference...

HISTORY: GRAPHS AND EQUATIONS 35 Computer Science → In Bayesian
expert systems research via probability on graphs, but with little causal focus: (Neapolitan, ; Pearl, ) → In arti cial intelligence: Pearl ( ) Political science → A stronghold of the Neyman-Rubin causal model but increasingly graphical: Sekhon, Imai, Green. Epidemiology → Graphs and potential outcomes in equal measure, pioneered by Hernan and Robins. → at’s why we’re reading their book.

GRAPH LIMITATIONS: ACYCLICITY 36 Our DAG framework has trouble with
→ ings that are logically connected → Instantaneous feedback (but we can sometimes ‘unroll’) → Equilibrium relationships → Control relations Dealing with causal inference with these features is an open research question → in case you fancied a PhD project in a few years... A Cyclic Graf

GRAPH LIMITATIONS: FAITHFULNESS 37 X Y Z where Z =
єZ X = γ + ZγZ + єX Y = β + XβX + ZβZ + єY But what if βX = −γZ βZ?

GRAPH LIMITATIONS: FAITHFULNESS 37 X Y Z where Z =
єZ X = γ + ZγZ + єX Y = β + XβX + ZβZ + єY But what if βX = −γZ βZ? For all parameter combination like that that exactly cancel X ⊥ ⊥ Z Despite the presence of a link from Z to X is is an example of unfaithfulness: an independence relationship in the data not implied by the graph → In theory: is never happens → In nite samples: is happens, but only by accident (Although a lot of parameter space is nearly unfaithful Uhler et al. ( )) → In practice: More o en because we make it happen

Causal Inference 2022 Week 1

Causal Inference 2022 Week 1

More Decks by Will Lowe

Featured

Transcript