Slide 1

Slide 1 text

IF CORRELATION DOES NOT IMPLY CAUSALITY then what does?

Slide 2

Slide 2 text

REFERENCE “If correlation doesn’t imply causation, then what does?” by Michael Nielsen Quantum physics Quantum computing Machine learning Neural network Deep learning http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/

Slide 3

Slide 3 text

CORRELATION DOES NOT IMPLY CAUSALITY

Slide 4

Slide 4 text

Credit: http://www.bloomberg.com/bw/magazine/correlation-or-causation-12012011-gfx.html

Slide 5

Slide 5 text

Credit: http://www.bloomberg.com/bw/magazine/correlation-or-causation-12012011-gfx.html

Slide 6

Slide 6 text

Credit: http://www.bloomberg.com/bw/magazine/correlation-or-causation-12012011-gfx.html

Slide 7

Slide 7 text

SIMPSON’S PARADOX Correlation can be reversed when you group data differently.

Slide 8

Slide 8 text

WAS BERKELEY SEXIST? Applicants Admitted Men 8442 44% Women 4321 35% Data source: admission for the fall of 1973 at UC Berkeley

Slide 9

Slide 9 text

But if you look closely... Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35%

Slide 10

Slide 10 text

In most departments, admission for women are actually higher than men. Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35%

Slide 11

Slide 11 text

WHY?!

Slide 12

Slide 12 text

Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35% Women tended to apply to competitive departments with low rates of admission

Slide 13

Slide 13 text

80% of statistics are made up

Slide 14

Slide 14 text

80% of statistics are made up interpretation ˄

Slide 15

Slide 15 text

I often wonder how many people with real decision-making power – politicians, judges, and so on – are making decisions based on statistical studies, and yet they don’t understand even basic things like Simpson’s paradox. - Michael Nielsen “ ”

Slide 16

Slide 16 text

SO IF CORRELATION DOES NOT IMPLY CAUSALITY then what does?

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Judea Pearl Causality theory Bayesian networks 2011 Turing Award winner

Slide 19

Slide 19 text

Lung cancer Smoker 17.2% Non-smoker 1.3% AN EXAMPLE

Slide 20

Slide 20 text

smoking cancer causes?

Slide 21

Slide 21 text

TOBACCO COMPANY WOULD SAY smoking cancer hidden genetic factor

Slide 22

Slide 22 text

smoking cancer hidden genetic factor LET’S MAKE IT MORE GENERAL

Slide 23

Slide 23 text

Theoretically, to prove if smoking causes lung cancer, we need to do a randomized controlled experiment.

Slide 24

Slide 24 text

smoking cancer hidden genetic factor RANDOMIZED CONTROLLED EXPERIMENT

Slide 25

Slide 25 text

smoking cancer hidden genetic factor Go smoke (50%) Don’t smoke (50%) RANDOMIZED CONTROLLED EXPERIMENT

Slide 26

Slide 26 text

Lung cancer Smoker P 1 Non-smoker P 2 If P 1 > P 2 , then we can say smoking causes lung cancer from the stats. After the experiment, we measure the percentages of people get cancer:

Slide 27

Slide 27 text

By forcing people (not) to smoke at random, you eliminate the hidden factor that may cause people to smoke. smoking cancer hidden genetic factor Experimenter

Slide 28

Slide 28 text

BUT FORCING PEOPLE IS INHUMAN! BUT FORCING PEOPLE IS INHUMAN!

Slide 29

Slide 29 text

Luckily, Pearl developed a causal calculus allowing us to make inferences without doing any inhuman experiments.

Slide 30

Slide 30 text

Causal models Causal conditional probabilities d-separation 3 rules of causal calculus

Slide 31

Slide 31 text

CAUSAL MODEL (aka Bayesian Network) smoking cancer hidden genetic factor

Slide 32

Slide 32 text

CAUSAL MODEL (aka Bayesian Network) S C H

Slide 33

Slide 33 text

CAUSAL MODEL (aka Bayesian Network) S C H Random variables

Slide 34

Slide 34 text

CAUSAL MODEL (aka Bayesian Network) S C H Influences

Slide 35

Slide 35 text

PARENT NOTATION S C H pa(H) = { } pa(C) = { H, S } pa(S) = { H }

Slide 36

Slide 36 text

PERTURBED GRAPH S C H S C H G G S Remember the “inhuman experiment”?

Slide 37

Slide 37 text

PERTURBED GRAPH S C H S C H G G S

Slide 38

Slide 38 text

Causal models Causal conditional probabilities d-separation 3 rules of causal calculus

Slide 39

Slide 39 text

PROBABILITY REVIEW All people Smokers Hidden gene on Cancer

Slide 40

Slide 40 text

p(s) = S / A All people Smokers Hidden gene on Cancer

Slide 41

Slide 41 text

p(~s) = (A - S) / A All people Smokers Hidden gene on Cancer

Slide 42

Slide 42 text

p(s, c) = (S∩C) / A All people Smokers Hidden gene on Cancer

Slide 43

Slide 43 text

p(c|s) = (S∩C) / S All people Hidden gene on Cancer Smokers

Slide 44

Slide 44 text

DEFINE “CAUSE” do(S) C H p(c|do(s)) > p(c|do(~s)) ⇔ smoking causes cancer

Slide 45

Slide 45 text

Causal models Causal conditional probabilities d-separation 3 rules of causal calculus

Slide 46

Slide 46 text

d-separation If X can tell us about Y, then X and Y are d-connected. Otherwise, they’re d-separated.

Slide 47

Slide 47 text

d-connected X Y

Slide 48

Slide 48 text

d-connected X Y

Slide 49

Slide 49 text

d-separated X ⊥ Y X Y

Slide 50

Slide 50 text

X Y Z (unknown) d-separated X ⊥ Y

Slide 51

Slide 51 text

X Y Z (known) d-connected

Slide 52

Slide 52 text

Causal models Causal conditional probabilities d-separation 3 rules of causal calculus

Slide 53

Slide 53 text

(X ⊥ Y | W, Z) G X Let’s try this notation first. What does it mean?

Slide 54

Slide 54 text

(X ⊥ Y | W, Z) G X Remove all the edges pointing to X in graph G

Slide 55

Slide 55 text

(X ⊥ Y | W, Z) G X Remove all the edges pointing to X in graph G Given that W and Z are known

Slide 56

Slide 56 text

(X ⊥ Y | W, Z) G X Remove all the edges pointing to X in graph G Given that W and Z are known X and Y are d-separated

Slide 57

Slide 57 text

RULE 1 (Y ⊥ Z | W, X) G X p(y|w, do(x), z) = p(y|w, do(x)) ⇒ When can we ignore observations?

Slide 58

Slide 58 text

RULE 1 (Y ⊥ Z | W, X) G X p(y|w, do(x), z) = p(y|w, do(x)) ⇒ When can we ignore observations? Z doesn’t have influence on Y, so we can omit z when calculating p(Y).

Slide 59

Slide 59 text

RULE 2 (Y ⊥ Z | W, X) G X, Z p(y|w, do(x), do(z)) = p(y|w, do(x), z) ⇒ When can we ignore the act of intervention?

Slide 60

Slide 60 text

RULE 2 (Y ⊥ Z | W, X) G X, Z p(y|w, do(x), do(z)) = p(y|w, do(x), z) ⇒ When can we ignore the act of intervention? If removing the influence of Z makes Y and Z unrelated, it makes no difference whether we have intervention on Z or not. Z HAS NO POWER HERE

Slide 61

Slide 61 text

RULE 3 (Y ⊥ Z | W, X) G X, Z - pa(W) p(y|w, do(x), do(z)) = p(y|w, do(x)) ⇒ When can we ignore an intervention variable entirely?

Slide 62

Slide 62 text

RULE 3 (Y ⊥ Z | W, X) G X, Z - pa(W) p(y|w, do(x), do(z)) = p(y|w, do(x)) ⇒ When can we ignore an intervention variable entirely? If Z only affects known variables (X and W), Y has nothing to do with Z.

Slide 63

Slide 63 text

Back to the subject... p(c|do(s)) > p(c|do(~s)) ⇔ smoking causes cancer

Slide 64

Slide 64 text

The 3 rules of causal calculus make it possible to derive p(c|do(s)) and p(c|do(~s)) only from observed statistics. No intervention ➭ no dos!

Slide 65

Slide 65 text

Smoking Cancer Hidden gene Tars in lungs Goal: compute p(c|do(s)) without dos

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

p(c|do(s)) = ? S C H T

Slide 68

Slide 68 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) S C H T

Slide 69

Slide 69 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) S C H T (T ⊥ S) GS RULE 2

Slide 70

Slide 70 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) S C H T (C ⊥ T|S) GS, T RULE 2

Slide 71

Slide 71 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) S C H T (C ⊥ S|T) GS, T RULE 3

Slide 72

Slide 72 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) S C H T

Slide 73

Slide 73 text

p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s|do(t)))⋅p(t|s) S C H T (C ⊥ T|S) GS, T RULE 2

Slide 74

Slide 74 text

S C H T (S ⊥ T) GT RULE 3 p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s))⋅p(t|s)

Slide 75

Slide 75 text

TOY MODEL Tar No tar Smoker 47.5% 85% cancer 2.5% 90% cancer Non-smoker 2.5% 5% cancer 47.5% 10% cancer p(c|do(s)) = 45.25% p(c|do(~s)) = 47.5% ➭ ➭Smoking reduces chance of getting cancer! * * This is not true because the numbers we’re using are not real

Slide 76

Slide 76 text

CONCLUSION

Slide 77

Slide 77 text

Correlation has nothing to do with causality. Remember Simpson’s Paradox.

Slide 78

Slide 78 text

Pearl’s theory of causality can be used to infer causal relationships without experimental intervention. But NOT always!

Slide 79

Slide 79 text

QUESTIONS?