Slide 1

Slide 1 text

Bayesian Statistics without Frequentist Language Richard McElreath Max Planck Institute for Evolutionary Anthropology Leipzig

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Outside view R.A. Fisher (1890–1962)

Slide 4

Slide 4 text

Outside view •Data have distributions •Parameters do not •Distinguish parameters and statistics •Likelihood not a probability distribution •Imaginary population •Bayes is sampling theory + priors •Priors are uniquely subjective

Slide 5

Slide 5 text

Lineage of complaints Dennis Lindley (1923–2013)

Slide 6

Slide 6 text

Conceptual friction •Common barriers: •Thinking data must look like likelihood function •Degrees of freedom •“Sampling” as source of all uncertainty •Defining random effects via sampling design •Neglect of data uncertainty •add your own

Slide 7

Slide 7 text

My Book is Neo-Colonial •I feel bad about choices made •Uses outsider perspective •“Likelihood” •“parameter” •“estimate” •Like explaining Indian politics using British political parties •Perpetuates confusion •Historical necessity?

Slide 8

Slide 8 text

Another path •Claim: Bayes easier and more powerful when understood from the inside •Problem: Many insider views CHAPTER 3 46656 Varieties of Bayesians (#765) . . Some attacks and defenses of the ~a~esian'~osition assume that i t is unique so i t should be helpful to point out that there are at least 46656 different interpreta- tions. This is shown by the following classification based on eleven facets. The count would be larger i f I had not artificially made some of the facets discrete and my heading would have been "On the Infinite Variety of Bayesians." All Bayesians, as I understand the term, believe that it is usually meaningful to talk about the probability of a hypothesis and they make some attempt to be con- sistent in their judgments. Thus von Mises (1942) would not count as a Bayesian, tions; ( tive pro Hegel a after th 5. U (c) utilit 6. Q nition t informa using q use of q think th 7. P exist bu 8. I credibil think o that cre nationa 9. D imagina from wh see ##13 10. A (comple venient 46656 V A R I E T CHAPTER 3 46656 Varieties of Bayesians (#765) . . Some attacks and defenses of the ~a~esian'~osition assume that i t is unique so i t should be helpful to point out that there are at least 46656 different interpreta- tions. This is shown by the following classification based on eleven facets. The count would be larger i f I had not artificially made some of the facets discrete and my heading would have been "On the Infinite Variety of Bayesians." All Bayesians, as I understand the term, believe that it is usually meaningful to talk about the probability of a hypothesis and they make some attempt to be con- sistent in their judgments. Thus von Mises (1942) would not count as a Bayesian, 4. Extremeness. (a) Formal Bayesian procedu tions; (b) non-Bayesian methods used provided t tive probability are not seen to be contradicted (th Hegel and Marx would call i t a synthesis); (c) n after they have been given a rough Bayesian justif 5. Utilities. (a) Brought in from the start; (c) utilities introduced separately from intuitive p 6. Quasiutilities. (a) Only one kind of utility nition that "quasiutilities" (#%90A, 755) are w information or "weights of evidence" (Peirce, 1 using quasiutilities without noticing that they use of quasiutilities is as old as the words "infor think the name "quasiutility" serves a useful purp 7. Physical probabilities. (a) Assumed to exis exist but without philosophical commitment (#6 8. Intuitive probability. (a) Subjective proba credibilities (logical probabilities) primary; (c) reg think of subjective probabilities as estimates of that credibilities really exist; (d) credibilities in national body. . . . 9. Device of imaginary results. (a) Explicit u imaginary experimental results used for judging from which are inferred discernments about the in see ##13, 547. 10. Axioms. (a) As simple as possible; (b) inc (complete additivity); (c) using Kolmogorov's ax I.J. Good 1971

Slide 9

Slide 9 text

Insider perspective •Bayesian approach: A joint generative model of all variables •Key ideas: •Unity among variables: No deep distinction between data and parameters •Unity among distributions: No deep distinction between likelihoods and priors

Slide 10

Slide 10 text

y ⇠ Normal( ✓, ) Likelihood or Prior? b T S

Slide 11

Slide 11 text

y ⇠ Normal( ✓, ) Likelihood or Prior? b T S If b is observed, likelihood. If b is unobserved, prior.

Slide 12

Slide 12 text

Corner cases •In conventional GLMs, no problem distinguishing data from parameters. •But what about: •GLMMs •Missing data •Measurement error •Many strange machines

Slide 13

Slide 13 text

notes cat rate of singing when cat present rate of singing when cat absent

Slide 14

Slide 14 text

notes cat rate of singing when cat present rate of singing when cat absent Observed variables Unobserved variables

Slide 15

Slide 15 text

Joint model Prob(notes, cat, rate|cat, rate|no-cat)

Slide 16

Slide 16 text

Joint model Prob(notes, cat, rate|cat, rate|no-cat) OPUFTU ∼ "(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ #(.) β ∼ $(.) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)

Slide 17

Slide 17 text

How is prior formed? •What pre-data information do we have about unobserved variables? •Rates are non-zero positive real values. Model expected value ==maxent==> Exponential •This most conservative distribution consistent w info •Like priors, likelihoods are pre-data distributions. •Use pre-data information (meta-data) to build them. •Notes are zero or positive integers. Model expected value ==maxent==> Poisson •Again, most conservative distribution consistent w info

Slide 18

Slide 18 text

How is prior formed? •What pre-data information do we have about unobserved variables? •Rates are non-zero positive real values. Model expected value ==maxent==> Exponential •This most conservative distribution consistent w info •Like priors, likelihoods are pre-data distributions. •Use pre-data information (meta-data) to build them. •Notes are zero or positive integers. Model expected value ==maxent==> Poisson •Again, most conservative distribution consistent w info

Slide 19

Slide 19 text

OPUFTU ∼ "(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ #(.) β ∼ $(.) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) data{ int N; int notes[N]; int cat[N]; } parameters{ real alpha; real beta; } model{ vector[N] lambda; beta ~ exponential( 0.1 ); alpha ~ exponential( 0.1 ); for ( i in 1:N ) { lambda[i] = (1 - cat[i]) * alpha + cat[i] * beta; } notes ~ poisson( lambda ); } Stan code notes ~ poisson(lambda), lambda <- (1-cat)*alpha + cat*beta, alpha ~ exponential(0.1), beta ~ exponential(0.1) map2stan code https://gist.github.com/rmcelreath

Slide 20

Slide 20 text

GLMM birds •Multiple birds, each with own rates: OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) OPUFTJU ∼ 1PJTTPO(λJU) λJU = (Ŵ − DBUJU)αJ + DBUJUβJ αJ ∼ &YQPOFOUJBM(Ŵ/¯ α) βJ ∼ &YQPOFOUJBM(Ŵ/¯ β) ¯ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) ¯ β ∼ &YQPOFOUJBM(Ŵ/Ŵų) S PO DBUT OPUFT ∼ 1PJTTPO(λ )

Slide 21

Slide 21 text

and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics. In the Bayesian framework, this definition implies that fixed effects β(m) j are estimated conditional on σm = ∞ and random effects β(m) j are estimated conditional on σm from the posterior distribution. Of these definitions, the first clearly stands apart, but the other four definitions differ also. Under the second definition, an effect can change from fixed to The Annals of Statistics 2005, Vol. 33, No. 1, 1–53 DOI 10.1214/009053604000001048 © Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE—WHY IT IS MO THAN EVER1 BY ANDREW GELMAN Columbia University Analysis of variance (ANOVA) is an extremely in exploratory and confirmatory data analysis. Unfortu problems (e.g., split-plot designs), it is not always e

Slide 22

Slide 22 text

and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics. In the Bayesian framework, this definition implies that fixed effects β(m) j are estimated conditional on σm = ∞ and random effects β(m) j are estimated conditional on σm from the posterior distribution. Of these definitions, the first clearly stands apart, but the other four definitions differ also. Under the second definition, an effect can change from fixed to The Annals of Statistics 2005, Vol. 33, No. 1, 1–53 DOI 10.1214/009053604000001048 © Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE—WHY IT IS MO THAN EVER1 BY ANDREW GELMAN Columbia University Analysis of variance (ANOVA) is an extremely in exploratory and confirmatory data analysis. Unfortu problems (e.g., split-plot designs), it is not always e

Slide 23

Slide 23 text

and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics. In the Bayesian framework, this definition implies that fixed effects β(m) j are estimated conditional on σm = ∞ and random effects β(m) j are estimated conditional on σm from the posterior distribution. Of these definitions, the first clearly stands apart, but the other four definitions differ also. Under the second definition, an effect can change from fixed to The Annals of Statistics 2005, Vol. 33, No. 1, 1–53 DOI 10.1214/009053604000001048 © Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE—WHY IT IS MO THAN EVER1 BY ANDREW GELMAN Columbia University Analysis of variance (ANOVA) is an extremely in exploratory and confirmatory data analysis. Unfortu problems (e.g., split-plot designs), it is not always e

Slide 24

Slide 24 text

and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics. In the Bayesian framework, this definition implies that fixed effects β(m) j are estimated conditional on σm = ∞ and random effects β(m) j are estimated conditional on σm from the posterior distribution. Of these definitions, the first clearly stands apart, but the other four definitions differ also. Under the second definition, an effect can change from fixed to The Annals of Statistics 2005, Vol. 33, No. 1, 1–53 DOI 10.1214/009053604000001048 © Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE—WHY IT IS MO THAN EVER1 BY ANDREW GELMAN Columbia University Analysis of variance (ANOVA) is an extremely in exploratory and confirmatory data analysis. Unfortu problems (e.g., split-plot designs), it is not always e

Slide 25

Slide 25 text

and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics. In the Bayesian framework, this definition implies that fixed effects β(m) j are estimated conditional on σm = ∞ and random effects β(m) j are estimated conditional on σm from the posterior distribution. Of these definitions, the first clearly stands apart, but the other four definitions differ also. Under the second definition, an effect can change from fixed to The Annals of Statistics 2005, Vol. 33, No. 1, 1–53 DOI 10.1214/009053604000001048 © Institute of Mathematical Statistics, 2005 DISCUSSION PAPER ANALYSIS OF VARIANCE—WHY IT IS MO THAN EVER1 BY ANDREW GELMAN Columbia University Analysis of variance (ANOVA) is an extremely in exploratory and confirmatory data analysis. Unfortu problems (e.g., split-plot designs), it is not always e

Slide 26

Slide 26 text

GLMM birds •Shrinkage happens everywhere OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) OPUFTJU ∼ 1PJTTPO(λJU) λJU = (Ŵ − DBUJU)αJ + DBUJUβJ αJ ∼ &YQPOFOUJBM(Ŵ/¯ α) βJ ∼ &YQPOFOUJBM(Ŵ/¯ β) ¯ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) ¯ β ∼ &YQPOFOUJBM(Ŵ/Ŵų) S PO DBUT

Slide 27

Slide 27 text

Efron’s example of “shrinkage estimator”

Slide 28

Slide 28 text

Galton’s “regression to mean”

Slide 29

Slide 29 text

data{ int N; int N_id; int notes[N]; int cat[N]; int id[N]; } parameters{ vector[N_id] alpha; vector[N_id] beta; real alpha_bar; real beta_bar; } model{ vector[N] lambda; beta_bar ~ exponential( 0.1 ); alpha_bar ~ exponential( 0.1 ); beta ~ exponential( 1.0/beta_bar ); alpha ~ exponential( 1.0/alpha_bar ); for ( i in 1:N ) { lambda[i] = (1 - cat[i]) * alpha[id[i]] + cat[i] * beta[id[i]]; } notes ~ poisson( lambda ); } Stan code notes ~ poisson(lambda), lambda <- (1-cat)*alpha[id] + cat*beta[id], alpha[id] ~ exponential(1.0/alpha_bar), beta[id] ~ exponential(1.0/beta_bar), alpha_bar ~ exponential(0.1), beta_bar ~ exponential(0.1) map2stan code OPUFTJU ∼ 1PJTTPO(λJU) λJU = (Ŵ − DBUJU)αJ + DBUJUβJ αJ ∼ &YQPOFOUJBM(Ŵ/¯ α) βJ ∼ &YQPOFOUJBM(Ŵ/¯ β) ¯ α ∼ &YQPOFOUJBM(Ŵ/Ŵų) ¯ β ∼ &YQPOFOUJBM(Ŵ/Ŵų) BUT OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUPCT,U ∼ #FSOPVMMJ(DBUU × ų.Ÿ) DBUU ∼ #FSOPVMMJ(ų.Ÿ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) https://gist.github.com/rmcelreath

Slide 30

Slide 30 text

Bad data, good cats •Jointly model cat behavior: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.0 2.0 x dbeta(x, 4, 4) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUPCT,U ∼ #FSOPVMMJ(DBUU × ų.Ÿ) DBUU ∼ #FSOPVMMJ(ų.Ÿ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) BU EBUB OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)

Slide 31

Slide 31 text

Bad data, good cats •Useful when some data go missing: some cat_t observations unavailable—cats stepped on the keyboard. •Same distribution does double duty: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.0 2.0 x dbeta(x, 4, 4) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUPCT,U ∼ #FSOPVMMJ(DBUU × ų.Ÿ) DBUU ∼ #FSOPVMMJ(ų.Ÿ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) BU EBUB OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)

Slide 32

Slide 32 text

parameters{ real kappa; real beta; real alpha; } model{ beta ~ exponential( 0.1 ); alpha ~ exponential( 0.1 ); kappa ~ beta( 4 , 4 ); for ( i in 1:N ) { if ( cat[i]==-1 ) { // cat missing target += log_mix( kappa , poisson_lpmf( notes[i] | beta ), poisson_lpmf( notes[i] | alpha ) ); } else { // cat not missing cat[i] ~ bernoulli(kappa); notes[i] ~ poisson( (1-cat[i])*alpha + cat[i]*beta ); } }//i } Stan code notes ~ poisson(lambda), lambda <- (1-cat)*alpha + cat*beta, cat ~ bernoulli(kappa), kappa ~ beta(4,4), alpha ~ exponential(0.1), beta ~ exponential(0.1) map2stan code OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)  https://gist.github.com/rmcelreath

Slide 33

Slide 33 text

generated quantities{ vector[N] cat_impute; for ( i in 1:N ) { real logPxy; real logPy; if ( cat[i]==-1 ) { logPxy = log(kappa) + poisson_lpmf( notes[i] | beta); logPy = log_mix( kappa , poisson_lpmf( notes[i] | beta ), poisson_lpmf( notes[i] | alpha ) ); cat_impute[i] = exp( logPxy - logPy ); } else { cat_impute[i] = cat[i]; } }//i } Stan code OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)  Mean StdDev lower 0.89 upper 0.89 n_eff Rhat kappa 0.52 0.13 0.30 0.72 1000 1 beta 7.40 1.44 5.00 9.52 1000 1 alpha 17.48 2.49 13.61 21.43 1000 1 cat_impute[1] 0.75 0.21 0.44 1.00 1000 1 cat_impute[2] 0.00 0.00 0.00 0.00 1000 NaN cat_impute[3] 1.00 0.00 1.00 1.00 1000 NaN cat_impute[4] 0.01 0.03 0.00 0.01 611 1 cat_impute[5] 1.00 0.00 1.00 1.00 1000 NaN cat_impute[6] 0.00 0.00 0.00 0.00 1000 NaN cat_impute[7] 1.00 0.00 1.00 1.00 1000 NaN https://gist.github.com/rmcelreath

Slide 34

Slide 34 text

Sly cats •Cats are hard to detect! Birds always see them, but data logger misses them half the time. •Unobserved cats as both “parameter” and “data” •Occupancy model λJU = (Ŵ − DBUJU)αJ + DBUJUβJ αJ ∼ &YQPOFOUJBM(¯ α−Ŵ ) βJ ∼ &YQPOFOUJBM(¯ β−Ŵ ) ¯ α ∼ &YQPOFOUJBM(Ŵų−Ŵ ) ¯ β ∼ &YQPOFOUJBM(Ŵų−Ŵ ) EFUFDUJPO FSSPS PO DBUT OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUPCT,U ∼ #FSOPVMMJ(DBUU × δ) DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) δ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) NJTTJOH DBU EBUB

Slide 35

Slide 35 text

model { beta ~ exponential( 0.1 ); alpha ~ exponential( 0.1 ); kappa ~ beta(4,4); delta ~ beta(4,4); for ( i in 1:N ) { if ( cat[i]==1 ) // cat present and detected target += log(kappa) + log(delta) + poisson_lpmf( notes[i] | beta ); if ( cat[i]==0 ) { // cat not observed, but cannot be sure not there // marginalize over unknown cat state: // (1) cat present and not detected // (2) cat absent target += log_sum_exp( log(kappa) + log1m(delta) + poisson_lpmf( notes[i] | beta ), log1m(kappa) + poisson_lpmf( notes[i] | alpha ) ); }//cat==0 }//i } Stan code ¯ β ∼ &YQPOFOUJBM(Ŵų−Ŵ ) T OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUPCT,U ∼ #FSOPVMMJ(DBUU × δ) DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) δ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų) OPUFTU ∼ 1PJTTPO(λU) λU = (Ŵ − DBUU)α + DBUUβ DBUU ∼ #FSOPVMMJ(κ) κ ∼ #FUB(ŷ, ŷ) α ∼ &YQPOFOUJBM(Ŵ/Ŵų) β ∼ &YQPOFOUJBM(Ŵ/Ŵų)  Mean StdDev lower 0.89 upper 0.89 n_eff Rhat beta 7.70 1.42 5.30 9.74 1000 1 alpha 18.13 2.57 14.47 22.46 1000 1 kappa 0.54 0.12 0.34 0.75 1000 1 delta 0.66 0.13 0.47 0.88 1000 1 https://gist.github.com/rmcelreath

Slide 36

Slide 36 text

http://panafrican.eva.mpg.de/

Slide 37

Slide 37 text

Four Unifying Forces •Unity of data/parameters, likelihoods/priors: 1.Same derivations & calculations 2.Same inferential force => e.g. shrinkage 3.Do double duty, conditional on observation 4.Can be both in same analysis

Slide 38

Slide 38 text

Benefits of insider view •Not necessary, but useful •Think scientifically, not statistically •Define generative model of all variables •Use observed variables in inference •Direct solutions to common problems •Measurement messes, propagate uncertainty •But lots of computational challenges remain! •Unified approach to construction •Demystifying. Deflationary. •Help in teaching — Bayes NOT likelihood + priors

Slide 39

Slide 39 text

A Modest Proposal Convention Proposal Data Observed variable Parameter Unobserved variable Likelihood Distribution Prior Distribution Posterior Conditional distribution Estimate banished Random banished