Upgrade to Pro — share decks privately, control downloads, hide ads and more …

If correlation doesn’t imply causality, then wh...

If correlation doesn’t imply causality, then what does?

Presentation on the Michael Nielsen's insightful article of the same title.

http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/

Chang-Hung Liang

June 03, 2015
Tweet

More Decks by Chang-Hung Liang

Other Decks in Science

Transcript

  1. REFERENCE “If correlation doesn’t imply causation, then what does?” by

    Michael Nielsen Quantum physics Quantum computing Machine learning Neural network Deep learning http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
  2. WAS BERKELEY SEXIST? Applicants Admitted Men 8442 44% Women 4321

    35% Data source: admission for the fall of 1973 at UC Berkeley
  3. But if you look closely... Department Men Women Applicants Admitted

    Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35%
  4. In most departments, admission for women are actually higher than

    men. Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35%
  5. Department Men Women Applicants Admitted Applicants Admitted A 825 62%

    108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7% Total 8442 44% 4321 35% Women tended to apply to competitive departments with low rates of admission
  6. I often wonder how many people with real decision-making power

    – politicians, judges, and so on – are making decisions based on statistical studies, and yet they don’t understand even basic things like Simpson’s paradox. - Michael Nielsen “ ”
  7. Theoretically, to prove if smoking causes lung cancer, we need

    to do a randomized controlled experiment.
  8. Lung cancer Smoker P 1 Non-smoker P 2 If P

    1 > P 2 , then we can say smoking causes lung cancer from the stats. After the experiment, we measure the percentages of people get cancer:
  9. By forcing people (not) to smoke at random, you eliminate

    the hidden factor that may cause people to smoke. smoking cancer hidden genetic factor Experimenter
  10. Luckily, Pearl developed a causal calculus allowing us to make

    inferences without doing any inhuman experiments.
  11. PARENT NOTATION S C H pa(H) = { } pa(C)

    = { H, S } pa(S) = { H }
  12. PERTURBED GRAPH S C H S C H G G

    S Remember the “inhuman experiment”?
  13. d-separation If X can tell us about Y, then X

    and Y are d-connected. Otherwise, they’re d-separated.
  14. (X ⊥ Y | W, Z) G X Let’s try

    this notation first. What does it mean?
  15. (X ⊥ Y | W, Z) G X Remove all

    the edges pointing to X in graph G
  16. (X ⊥ Y | W, Z) G X Remove all

    the edges pointing to X in graph G Given that W and Z are known
  17. (X ⊥ Y | W, Z) G X Remove all

    the edges pointing to X in graph G Given that W and Z are known X and Y are d-separated
  18. RULE 1 (Y ⊥ Z | W, X) G X

    p(y|w, do(x), z) = p(y|w, do(x)) ⇒ When can we ignore observations?
  19. RULE 1 (Y ⊥ Z | W, X) G X

    p(y|w, do(x), z) = p(y|w, do(x)) ⇒ When can we ignore observations? Z doesn’t have influence on Y, so we can omit z when calculating p(Y).
  20. RULE 2 (Y ⊥ Z | W, X) G X,

    Z p(y|w, do(x), do(z)) = p(y|w, do(x), z) ⇒ When can we ignore the act of intervention?
  21. RULE 2 (Y ⊥ Z | W, X) G X,

    Z p(y|w, do(x), do(z)) = p(y|w, do(x), z) ⇒ When can we ignore the act of intervention? If removing the influence of Z makes Y and Z unrelated, it makes no difference whether we have intervention on Z or not. Z HAS NO POWER HERE
  22. RULE 3 (Y ⊥ Z | W, X) G X,

    Z - pa(W) p(y|w, do(x), do(z)) = p(y|w, do(x)) ⇒ When can we ignore an intervention variable entirely?
  23. RULE 3 (Y ⊥ Z | W, X) G X,

    Z - pa(W) p(y|w, do(x), do(z)) = p(y|w, do(x)) ⇒ When can we ignore an intervention variable entirely? If Z only affects known variables (X and W), Y has nothing to do with Z.
  24. The 3 rules of causal calculus make it possible to

    derive p(c|do(s)) and p(c|do(~s)) only from observed statistics. No intervention ➭ no dos!
  25. p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s),

    t)⋅p(t|s) S C H T (T ⊥ S) GS RULE 2
  26. p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s),

    t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) S C H T (C ⊥ T|S) GS, T RULE 2
  27. p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s),

    t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) S C H T (C ⊥ S|T) GS, T RULE 3
  28. p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s),

    t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) S C H T
  29. p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s),

    t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s|do(t)))⋅p(t|s) S C H T (C ⊥ T|S) GS, T RULE 2
  30. S C H T (S ⊥ T) GT RULE 3

    p(c|do(s)) = Σ t p(c|do(s), t)⋅p(t|do(s)) = Σ t p(c|do(s), t)⋅p(t|s) = Σ t p(c|do(s), do(t))⋅p(t|s) = Σ t p(c|do(t))⋅p(t|s) = Σ t (Σ s p(c|do(t), s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s|do(t)))⋅p(t|s) = Σ t (Σ s p(c|t, s)⋅p(s))⋅p(t|s)
  31. TOY MODEL Tar No tar Smoker 47.5% 85% cancer 2.5%

    90% cancer Non-smoker 2.5% 5% cancer 47.5% 10% cancer p(c|do(s)) = 45.25% p(c|do(~s)) = 47.5% ➭ ➭Smoking reduces chance of getting cancer! * * This is not true because the numbers we’re using are not real
  32. Pearl’s theory of causality can be used to infer causal

    relationships without experimental intervention. But NOT always!