Slide 1

Slide 1 text

Bayesian Statistical Analysis A Gentle Introduction Center for Quantitative Sciences Workshop 18 November 2011 Christopher J. Fonnesbeck Monday, December 5, 11

Slide 2

Slide 2 text

What is Bayesian Inference? Monday, December 5, 11

Slide 3

Slide 3 text

Practical methods for making inferences from data using probability models for quantities we observe and about which we wish to learn. Gelman et al., 2004 Monday, December 5, 11

Slide 4

Slide 4 text

Rev. Thomas Bayes Monday, December 5, 11

Slide 5

Slide 5 text

Rev. Thomas Bayes Simon Laplace Monday, December 5, 11

Slide 6

Slide 6 text

Conclusions in terms of probability statements p( |y) unknowns observations Monday, December 5, 11

Slide 7

Slide 7 text

Classical inference conditions on unknown parameter p(y| ) unknowns observations Monday, December 5, 11

Slide 8

Slide 8 text

Classical vs Bayesian Statistics Monday, December 5, 11

Slide 9

Slide 9 text

Frequentist Monday, December 5, 11

Slide 10

Slide 10 text

Frequentist observations random Monday, December 5, 11

Slide 11

Slide 11 text

Frequentist model, parameters fixed Monday, December 5, 11

Slide 12

Slide 12 text

Frequentist Inference Monday, December 5, 11

Slide 13

Slide 13 text

Choose an estimator ˆ µ = P xi n based on frequentist (asymptotic) criteria Monday, December 5, 11

Slide 14

Slide 14 text

Choose a test statistic based on frequentist (asymptotic) criteria t = ¯ x µ s/ p n Monday, December 5, 11

Slide 15

Slide 15 text

Bayesian Monday, December 5, 11

Slide 16

Slide 16 text

Bayesian observations fixed Monday, December 5, 11

Slide 17

Slide 17 text

Bayesian model, parameters “random” Monday, December 5, 11

Slide 18

Slide 18 text

Components of Bayesian Statistics Monday, December 5, 11

Slide 19

Slide 19 text

Specify full probability model 1 Pr(y| )Pr( |⇥)Pr(⇥) Monday, December 5, 11

Slide 20

Slide 20 text

data y Monday, December 5, 11

Slide 21

Slide 21 text

data y covariates X Monday, December 5, 11

Slide 22

Slide 22 text

data y covariates X parameters ✓ Monday, December 5, 11

Slide 23

Slide 23 text

data y covariates X parameters ✓ missing data ˜ y Monday, December 5, 11

Slide 24

Slide 24 text

2 Calculate posterior distribution Pr( |y) Monday, December 5, 11

Slide 25

Slide 25 text

3Check model for lack of fit Monday, December 5, 11

Slide 26

Slide 26 text

Why Bayes? ? Monday, December 5, 11

Slide 27

Slide 27 text

“... the Bayesian approach is attractive because it is useful. Its usefulness derives in large measure from its simplicity. Its simplicity allows the investigation of far more complex models than can be handled by the tools in the classical toolbox.” Link and Barker (2010) Monday, December 5, 11

Slide 28

Slide 28 text

coherence X ˜ y y ✓ Monday, December 5, 11

Slide 29

Slide 29 text

Interpretation Monday, December 5, 11

Slide 30

Slide 30 text

Pr( ¯ Y 1.96 ⇥ ⇥ n < µ < ¯ Y + 1.96 ⇥ ⇥ n ) = 0.95 Confidence Interval Pr(a(Y ) < ✓ < b(Y )|✓) = 0.95 Monday, December 5, 11

Slide 31

Slide 31 text

Credible Interval Pr(a(y) < ✓ < b(y)|Y = y) = 0.95 Monday, December 5, 11

Slide 32

Slide 32 text

Uncertainty Monday, December 5, 11

Slide 33

Slide 33 text

C alpha N z b_psi beta a_psi pi mu psi Ntotal occupied a b Ndist psi z alpha pi N beta mu occupied N alpha beta N alpha beta Complex Models Monday, December 5, 11

Slide 34

Slide 34 text

Probability Monday, December 5, 11

Slide 35

Slide 35 text

Pr(A) = m n A = an event of interest m = no. of favourable outcomes n = total no. of possible outcomes (1) classical Monday, December 5, 11

Slide 36

Slide 36 text

all elementary events are equally likely Monday, December 5, 11

Slide 37

Slide 37 text

Pr(A) = lim n→∞ m n n = no. of identical and independent trials m = no. of times A has occurred (2) frequentist Monday, December 5, 11

Slide 38

Slide 38 text

Between 1745 and 1770 there were 241,945 girls and 251,527 boys born in Paris Monday, December 5, 11

Slide 39

Slide 39 text

A = “Chris has Type A blood” Monday, December 5, 11

Slide 40

Slide 40 text

A = “Titans will win Superbowl XLVI” Monday, December 5, 11

Slide 41

Slide 41 text

A = “The prevalence of diabetes in Nashville is > 0.15” Monday, December 5, 11

Slide 42

Slide 42 text

(3) subjective Pr(A) Monday, December 5, 11

Slide 43

Slide 43 text

Measure of one’s uncertainty regarding the occurrence of A Pr(A) Monday, December 5, 11

Slide 44

Slide 44 text

Pr(A|H) Monday, December 5, 11

Slide 45

Slide 45 text

A = “It is raining in Atlanta” Monday, December 5, 11

Slide 46

Slide 46 text

Pr(A|H) = 0.5 Monday, December 5, 11

Slide 47

Slide 47 text

Pr( A|H ) = ⇢ 0 . 4 if raining in Nashville 0 . 25 otherwise Monday, December 5, 11

Slide 48

Slide 48 text

Pr(A|H) = 1, if raining 0, otherwise Monday, December 5, 11

Slide 49

Slide 49 text

S A Pr(A) = area of A area of S Monday, December 5, 11

Slide 50

Slide 50 text

S A B A ∩ B Pr(A ⇥ B) = Pr(A) + Pr(B) Pr(A ⇤ B) Monday, December 5, 11

Slide 51

Slide 51 text

A A ∩ B Pr(B|A) = Pr(A B) Pr(A) Monday, December 5, 11

Slide 52

Slide 52 text

A A ∩ B conditional probability Pr(B|A) = Pr(A B) Pr(A) Monday, December 5, 11

Slide 53

Slide 53 text

Independence Pr(B|A) = Pr(B) Monday, December 5, 11

Slide 54

Slide 54 text

S A B A ∩ B Pr(B|A) = Pr(A B) Pr(A) Monday, December 5, 11

Slide 55

Slide 55 text

S A B A ∩ B Pr(A|B) = Pr(A B) Pr(B) Pr(B|A) = Pr(A B) Pr(A) Monday, December 5, 11

Slide 56

Slide 56 text

Pr(A B) = Pr(A|B)Pr(B) = Pr(B|A)Pr(A) Monday, December 5, 11

Slide 57

Slide 57 text

Bayes Theorem Pr(B|A) = Pr(A|B)Pr(B) Pr(A) Monday, December 5, 11

Slide 58

Slide 58 text

Bayes Theorem Pr( |y) = Pr(y| )Pr( ) Pr(y) Posterior Probability Prior Probability Likelihood of Observations Normalizing Constant Monday, December 5, 11

Slide 59

Slide 59 text

Bayes Theorem Pr( |y) = Pr(y| )Pr( ) R Pr(y| )Pr( )d Monday, December 5, 11

Slide 60

Slide 60 text

“proportional to” Pr( |y) Pr(y| )Pr( ) Monday, December 5, 11

Slide 61

Slide 61 text

Pr( |y) Pr(y| )Pr( ) Posterior Prior Likelihood Monday, December 5, 11

Slide 62

Slide 62 text

information p( |y) p(y| )p( ) Monday, December 5, 11

Slide 63

Slide 63 text

“Following observation of , the likelihood contains all experimental information from about the unknown .” θ y y L(✓|y) Monday, December 5, 11

Slide 64

Slide 64 text

binomial model data parameter sampling distribution of X p(X|✓) = ✓ N n ◆ ✓x (1 ✓)N x Monday, December 5, 11

Slide 65

Slide 65 text

binomial model likelihood function for θ L(✓|X) = ✓ N n ◆ ✓x (1 ✓)N x Monday, December 5, 11

Slide 66

Slide 66 text

prior distribution p(θ|y) ∝ p(y|θ)p(θ) Monday, December 5, 11

Slide 67

Slide 67 text

Prior as population distribution Monday, December 5, 11

Slide 68

Slide 68 text

Monday, December 5, 11

Slide 69

Slide 69 text

Prior as information state Monday, December 5, 11

Slide 70

Slide 70 text

Monday, December 5, 11

Slide 71

Slide 71 text

All plausible values Monday, December 5, 11

Slide 72

Slide 72 text

Between 1745 and 1770 there were 241,945 girls and 251,527 boys born in Paris Monday, December 5, 11

Slide 73

Slide 73 text

Bayesian analysis is subjective Monday, December 5, 11

Slide 74

Slide 74 text

Statistical analysis is subjective Monday, December 5, 11

Slide 75

Slide 75 text

“... all forms of statistical inference make assumptions, assumptions which can only be tested very crudely and can almost never be verified.” - Robert E. Kass Monday, December 5, 11

Slide 76

Slide 76 text

3 Model checking Monday, December 5, 11

Slide 77

Slide 77 text

1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 x p(x) separation Monday, December 5, 11

Slide 78

Slide 78 text

source: Gelman et al. 2008 Monday, December 5, 11

Slide 79

Slide 79 text

weakly-informative prior -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 xrange Pr(x) Monday, December 5, 11

Slide 80

Slide 80 text

source: Gelman et al. 2008 Monday, December 5, 11

Slide 81

Slide 81 text

example: genetic probabilities Monday, December 5, 11

Slide 82

Slide 82 text

X-linked recessive Monday, December 5, 11

Slide 83

Slide 83 text

Monday, December 5, 11

Slide 84

Slide 84 text

affected carrier no gene unknown Woman Husband Brother Mother is the woman a carrier? Monday, December 5, 11

Slide 85

Slide 85 text

Pr(θ = 1) = Pr(θ = 0) = 1 2 Pr(θ = 1) Pr(θ = 0) = 1 prior odds Monday, December 5, 11

Slide 86

Slide 86 text

affected carrier no gene unknown Woman Husband Brother Son Son Mother Monday, December 5, 11

Slide 87

Slide 87 text

Pr(y1 = 0, y2 = 0|θ = 1) = (0.5)(0.5) = 0.25 Monday, December 5, 11

Slide 88

Slide 88 text

Pr(y1 = 0, y2 = 0|θ = 1) = (0.5)(0.5) = 0.25 Pr(y1 = 0, y2 = 0|θ = 0) = 1 Monday, December 5, 11

Slide 89

Slide 89 text

Pr(y1 = 0, y2 = 0|θ = 1) = (0.5)(0.5) = 0.25 Pr(y1 = 0, y2 = 0|θ = 0) = 1 “likelihood ratio” p(y1 = 0, y2 = 0|θ = 1) p(y1 = 0, y2 = 0|θ = 0) = 0.25 1 = 1/4 Monday, December 5, 11

Slide 90

Slide 90 text

what about Mom? Monday, December 5, 11

Slide 91

Slide 91 text

what about Mom? y = {y1 = 0, y2 = 0} Pr( = 1|y) = Pr(y| = 1)Pr( = 1) Pr(y) = Pr(y| = 1)Pr( = 1) P ✓ Pr(y| )Pr( ) Monday, December 5, 11

Slide 92

Slide 92 text

y = {y1 = 0, y2 = 0} Monday, December 5, 11

Slide 93

Slide 93 text

Pr( = 1|y) = p(y| = 1)Pr( = 1) p(y| = 1)Pr( = 1) + p(y| = 0)Pr( = 0) y = {y1 = 0, y2 = 0} Monday, December 5, 11

Slide 94

Slide 94 text

Pr( = 1|y) = p(y| = 1)Pr( = 1) p(y| = 1)Pr( = 1) + p(y| = 0)Pr( = 0) = (0.25)(0.5) (0.25)(0.5) + (1.0)(0.5) = 0.125 0.625 = 0.2 y = {y1 = 0, y2 = 0} Monday, December 5, 11

Slide 95

Slide 95 text

3rd unaffected son? Pr( = 1|y3 ) = (0.5)(0.2) (0.5)(0.2) + (1)(0.8) = 0.111 posterior from previous Monday, December 5, 11

Slide 96

Slide 96 text

Hierarchical Models Monday, December 5, 11

Slide 97

Slide 97 text

effectiveness of cardiac surgery example Monday, December 5, 11

Slide 98

Slide 98 text

Hospital Operations Deaths A 47 0 B 148 18 C 119 8 D 810 46 E 211 8 F 196 13 G 148 9 H 215 31 I 207 14 J 97 8 K 256 29 L 360 24 Monday, December 5, 11

Slide 99

Slide 99 text

clustering induces dependence between observations Monday, December 5, 11

Slide 100

Slide 100 text

parameters sampled from common distribution j hospital j survival rate Monday, December 5, 11

Slide 101

Slide 101 text

population distribution j f(⇥) hyperparameters Monday, December 5, 11

Slide 102

Slide 102 text

θ1 θ2 θk y1 y2 yk ... ... deaths parameters Monday, December 5, 11

Slide 103

Slide 103 text

θ1 θ2 θk y1 y2 yk ... ... deaths parameters µ, σ2 hyperparameters Monday, December 5, 11

Slide 104

Slide 104 text

, ϕµ ϕσ θ1 θ2 θk y1 y2 yk ... ... deaths parameters µ, σ2 hyperparameters Monday, December 5, 11

Slide 105

Slide 105 text

non-hierarchical models of hierarchical data can easily be underfit or overfit Monday, December 5, 11

Slide 106

Slide 106 text

“experiments” j = 1, . . . , J likelihood ∼ Binomial( , ) deaths j operations j θj logit( ) ∼ N(µ, ) θi σ2 population model µ ∼ , ∼ Pµ σ2 Pσ priors Monday, December 5, 11

Slide 107

Slide 107 text

0/47 = 0 18/148 = 0.12 8/119 = 0.07 46/810 = 0.06 Monday, December 5, 11

Slide 108

Slide 108 text

Monday, December 5, 11

Slide 109

Slide 109 text

Monday, December 5, 11