Richard McElreath
February 05, 2020
3.8k

# Bayesian Inference is Just Counting

Conceptual introduction to Bayesian inference and data analysis, with a little causal inference at the end

## Richard McElreath

February 05, 2020

## Transcript

1. BAYESIAN INFERENCE
IS JUST COUNTING
Richard McElreath
MPI-EVA
p(x|y)p(y)/p(x)

2. The Golem of Prague
go•lem |gōlǝm|
noun
• (in Jewish legend) a clay figure
brought to life by magic.
• an automaton or robot.
ORIGIN late 19th cent.: from Yiddish
goylem, from Hebrew gōlem
‘shapeless mass.’

3. The Golem of Prague
“Even the most perfect of Golem, risen to
life to protect us, can easily change into a
destructive force. Therefore let us treat
carefully that which is strong, just as we
bow kindly and patiently to that which is
weak.”
Rabbi Judah Loew ben
Bezalel (1512–1609)
From Breath of Bones: A Tale of the Golem

4. The Golems of Science
Golem
• Animated by “truth”
• Powerful
• Blind to creator’s intent
• Easy to misuse
• Fictional
Model
• Animated by “truth”
• Hopefully powerful
• Blind to creator’s intent
• Easy to misuse
• Not even false

5. Bayesian data analysis
• Use probability to describe uncertainty
• Extends ordinary logic (true/false) to
continuous plausibility
• Computationally diﬃcult
• Markov chain Monte Carlo (MCMC) to
the rescue
• Used to be controversial
• Ronald Fisher: Bayesian analysis “must be
wholly rejected.”
Pierre-Simon Laplace (1749–1827)
Sir Harold Jeﬀreys (1891–1989)
Jeﬀreys (1903–1999)

6. Bayesian data analysis
Count all the ways data can happen,
according to assumptions.
Assumptions with more ways that are
consistent with data are more
plausible.

7. Bayesian data analysis
• Contrast with frequentist view
• Probability is just limiting
frequency
• Uncertainty arises from sampling
variation
• Bayesian probability much
more general
• Probability is in the golem, not in
the world
• Coins are not random, but our
ignorance makes them so
Saturn as Galileo saw it

8. Garden of Forking Data
• The future:
• Full of branching paths
• Each choice closes some
• The data:
• Many possible events
• Each observation eliminates some

9. Garden of Forking Data
(1)
(2)
(3)
(4)
(5)
Contains 4 marbles
?
Possible contents:
Observe:

10. Conjecture: Data:

11. Conjecture: Data:

12. Conjecture: Data:

13. Conjecture: Data:
3 paths consistent with data

14. Garden of Forking Data
(1)
(2)
(3)
(4)
(5)
Possible contents: Ways to produce
?
3
?
?
?

15. Garden of Forking Data
(1)
(2)
(3)
(4)
(5)
Possible contents: Ways to produce
0
3
?
?
0

16. 3 ways
9 ways
8 ways

17. 3 ways
9 ways
8 ways

18. 3 ways
9 ways
8 ways

19. Garden of Forking Data
OE XIJUF UIFSF BSF QBUIT UIBU TVSWJWF
WF DPOTJEFSFE ĕWF EJČFSFOU DPOKFDUVSFT BCPVU UIF DPOUFOUT PG UIF CBH
F NBSCMFT UP GPVS CMVF NBSCMFT 'PS FBDI PG UIFTF DPOKFDUVSFT XFWF
TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUFOUJBMMZ
EBUB
\$POKFDUVSF 8BZT UP QSPEVDF
< > × × =
< > × × =
< > × × =
< > × × =
< > × × =
S PG XBZT UP QSPEVDF UIF EBUB GPS FBDI DPOKFDUVSF DBO CF DPNQVUFE
VNCFS PG QBUIT JO FBDI iSJOHw PG UIF HBSEFO BOE UIFO CZ NVMUJQMZJOH
ćJT JT KVTU B DPNQVUBUJPOBM EFWJDF *U UFMMT VT UIF TBNF UIJOH BT 'ĶĴ
BWJOH UP ESBX UIF HBSEFO ćF GBDU UIBU OVNCFST BSF NVMUJQMJFE EVSJOH
OHF UIF GBDU UIBU UIJT JT TUJMM KVTU DPVOUJOH PG MPHJDBMMZ QPTTJCMF QBUIT

20. Updating
Another draw from the bag:
QBUIT DPNQBUJCMF XJUI UIF EBUB TFRVFODF 0S ZPV DPVME UBLF UIF Q
PWFS DPOKFDUVSFT
BOE KVTU VQEBUF UIFN JO MJHIU PG UIF OFX PCTFS
PVU UIBU UIFTF UXP NFUIPET BSF NBUIFNBUJDBMMZ JEFOUJDBM "T MPOH BT UIF OFX
JT MPHJDBMMZ JOEFQFOEFOU PG UIF QSFWJPVT PCTFSWBUJPOT
SFT IPX UP EP JU 'JSTU XF DPVOU UIF OVNCFST PG XBZT FBDI DPOKFDUVSF DPVME Q
X PCTFSWBUJPO ćFO XF NVMUJQMZ FBDI PG UIFTF OFX DPVOUT CZ UIF QSFWJPVT OV
GPS FBDI DPOKFDUVSF *O UBCMF GPSN
\$POKFDUVSF 8BZT UP QSPEVDF 1SFWJPVT DPVOUT /FX DPVOU
< > × =
< > × =
< > × =
< > × =
< > × =
X DPVOUT JO UIF SJHIUIBOE DPMVNO BCPWF TVNNBSJ[F BMM UIF FWJEFODF GPS FBDI
T OFX EBUB BSSJWF BOE QSPWJEFE UIPTF EBUB BSF JOEFQFOEFOU PG QSFWJPVT PCTFSW
F OVNCFS PG MPHJDBMMZ QPTTJCMF XBZT GPS B DPOKFDUVSF UP QSPEVDF BMM UIF EBUB VQ
BO CF DPNQVUFE KVTU CZ NVMUJQMZJOH UIF OFX DPVOU CZ UIF PME DPVOU
T VQEBUJOH BQQSPBDI BNPVOUT UP OPUIJOH NPSF UIBO BTTFSUJOH UIBU
XIFO X
VT JOGPSNBUJPO TVHHFTUJOH UIFSF BSF 8QSJPS
XBZT GPS B DPOKFDUVSF UP QSPEVDF B Q
4

21. Using other information
marbles rare, but every bag contains at least one.
Factory says:
IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX
#VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ
UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF
H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT
> ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO
8F DBO VQEBUF PVS DPVOUT BHBJO
\$POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
< > × =
< > × =
< > × =
< > × =
< > × =
DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *
E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO
SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF
UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBXO GSPN
FSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQQPTF GPS
POF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF 4P GPS
H < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT DPOUBJO
Z BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE POF XIJUF
EBUF PVS DPVOUT BHBJO
\$POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
> × =
> × =
> × =
> × =
> × =
< > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *T UIFSF B
F JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPOKFDUVSFT
:PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
BM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSFWJPVT JO

22. Using other information
marbles rare.
Factory says:
IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX
#VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ
UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF
H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT
> ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO
8F DBO VQEBUF PVS DPVOUT BHBJO
\$POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
< > × =
< > × =
< > × =
< > × =
< > × =
DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *
E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO
SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF

23. Counts to plausibility
Unglamorous basis of applied probability:
Things that can happen more ways are more plausible.
J[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF
XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT
QMBVTJCJMJUZ PG Q BęFS %OFX =
XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
TVN PG QSPEVDUT
FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
\$POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
UZw
1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
< >
< > .
< > .
< > .
< >
DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
ʄǤ ǭ ƾ ǐ ǃ ǐ Ǆ Ǯ

24. Counts to plausibility
TVN PG QSPEVDUT
FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
\$POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
UZw
1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
< >
< > .
< > .
< > .
< >
DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
ʄǤ ǭ ƾ ǐ ǃ ǐ Ǆ Ǯ
ǳ.0(ǭ24.Ǯ
ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFTUIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF

CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ
EBSEJ[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF
XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
"OE UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT
QMBVTJCJMJUZ PG Q BęFS %OFX =
XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
TVN PG QSPEVDUT
ćFSFT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH UIF
OVNCFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
\$POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQMBV
TJCJMJUZw
1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
< >
< > .
< > .
< > .
< >
:PV DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
3 DPEF

24. ʄǤ ǭ ƾ ǐ ǃ ǐ Ǆ Ǯ
24.ǳ.0(ǭ24.Ǯ
ǯƼǰ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFTUIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF
SFBM
OVNCFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJMJUJFT
ZPV DBO BMTP EP XJUI UIFTF WBMVFT 4QFDJĕDBMMZ FBDI QJFDF PG UIF DBMDVMBUJPO IBT B EJSFDU
QBSUOFS JO BQQMJFE QSPCBCJMJUZ UIFPSZ ćFTF QBSUOFST IBWF TUFSFPUZQFE OBNFT TP JUT XPSUI

25. Counts to plausibility
TVN PG QSPEVDUT
FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
\$POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
UZw
1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
< >
< > .
< > .
< > .
< >
DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
ʄǤ ǭ ƾ ǐ ǃ ǐ Ǆ Ǯ
ǳ.0(ǭ24.Ǯ
ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFTUIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF

CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ
Plausibility is probability: Set of non-negative real
numbers that sum to one.
Probability theory is just a set of shortcuts for
counting possibilities.

26. Building a model
• How to use probability to do typical statistical
modeling?
1. Design the model (data story)
2. Condition on the data (update)
3. Evaluate the model (critique)

27. Nine tosses of the globe:
W L W W W L W L W

28. Design > Condition > Evaluate
• Data story motivates the model
• How do the data arise?
• For W L W W W L W L W:
• Some true proportion of water, p
• Toss globe, probability p of observing W, 1–p of L
• Each toss therefore independent of other tosses
• Translate data story into probability statements

29. Design > Condition > Evaluate
• Bayesian updating defines optimal learning in small
world, converts prior into posterior
• Give your golem an information state, before the data:
Here, an initial confidence in each possible value of p
between zero and one
• Condition on data to update information state: New
confidence in each value of p, conditional on data

30. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W W
n = 4
W L W W W L W L W
confidence
n = 5
W L W W W L W L W W
prior
p, proportion W
plausibility

31. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W W
n = 4
W L W W W L W L W
confidence
n = 5
W L W W W L W L W W
prior
posterior
p, proportion W
plausibility

32. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W

33. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W

34. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W

35. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W

36. probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W

37. Design > Condition > Evaluate
• Data order irrelevant, because
golem assumes order irrelevant
• All-at-once, one-at-a-time,
shuﬄed order all give same
posterior
• Every posterior is a prior for
next observation
• Every prior is posterior of
some other inference
• Sample size automatically
embodied in posterior
4."-- 803-%4 "/% -"3(& 803-%4
probability of water
0 0.5 1
n = 1
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 2
W L W W W L W L W
probability of water
0 0.5 1
n = 3
W L W W W L W L W
probability of water
0 0.5 1
n = 4
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 5
W L W W W L W L W
probability of water
0 0.5 1
n = 6
W L W W W L W L W
probability of water
0 0.5 1
n = 7
W L W W W L W L W
confidence
probability of water
0 0.5 1
n = 8
W L W W W L W L W
probability of water
0 0.5 1
n = 9
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
proportion water
0 0.5 1
plausibility
n = 0
W L W W W L W L W
'ĶĴłĿĲ Ɗƍ )PX B #BZFTJBO NPEFM MFBSOT &BDI UPTT PG UIF HMPCF QSPEVDFT
BO PCTFSWBUJPO PG XBUFS 8
PS MBOE -
ćF NPEFMT FTUJNBUF PG UIF QSP
QPSUJPO PG XBUFS PO UIF HMPCF JT B QMBVTJCJMJUZ GPS FWFSZ QPTTJCMF WBMVF ćF
MJOFT BOE DVSWFT JO UIJT ĕHVSF BSF UIFTF DPMMFDUJPOT PG QMBVTJCJMJUJFT *O FBDI
QMPU B QSFWJPVT QMBVTJCJMJUJFT EBTIFE DVSWF
BSF VQEBUFE JO MJHIU PG UIF MBUFTU

38. Design > Condition > Evaluate
• Bayesian inference: Logical answer to
a question in the form of a model

“How plausible is each proportion of
water, given these data?”
• Golem must be supervised
• Did the golem malfunction?
• Does the golem’s answer make sense?
• Does the question make sense?
• Check sensitivity of answer to changes in
assumptions

39. Construction perspective
• Build joint model:
(1) List variables
(2) Define generative relations
(3) ???
(4) Profit
• Input: Joint prior
• Deduce: Joint posterior

40. The Joint Model
σ
α
ρσα
σβ
ρσα
σβ
σ
β
8 ∼ #JOPNJBM(/, Q)
Q ∼ 6OJGPSN(, )
• Bayesian models are generative
• Can be run forward to generate predictions or
simulate date
• Can be run in reverse to infer process from data

41. The Joint Model
σ
α
ρσα
σβ
ρσα
σβ
σ
β
8 ∼ #JOPNJBM(/, Q)
Q ∼ 6OJGPSN(, )
• Run forward:

42. Run in Reverse:
Computing the posterior
1. Analytical approach (often impossible)
2. Grid approximation (very intensive)
4. Markov chain Monte Carlo (intensive)

43. Predictive checks
• Something like a significance test, but not
• No universally best way to evaluate
• No way to justify always using a
threshold like 5%
• Good predictive checks always depend
upon purpose and imagination
“It would be very nice to
have a formal apparatus that
gives us some ‘optimal’ way
of recognizing unusual
phenomena and inventing
new classes of hypotheses
[...]; but this remains an art
for the creative human
mind.”
—E.T. Jaynes (1922–1998)

44. Triumph of Geocentrism
• Claudius Ptolemy (90–168)
• Egyptian mathematician
• Accurate model of planetary
motion
• Epicycles: orbits on orbits
• Fourier series
-*/&"
Earth
equant
planet
epicycle
deferent

45. Geocentrism
• Descriptively accurate
• Mechanistically wrong
• General method of
approximation
• Known to be wrong
Regression
• Descriptively accurate
• Mechanistically wrong
• General method of
approximation
• Taken too seriously

46. Linear regression
• Simple statistical golems
• Model of mean and variance of normally (Gaussian)
distributed measure
• Mean as additive combination of weighted variables
• Constant variance

47. 1809 Bayesian argument for normal error and least-squares estimation

48. Why normal?
• Why are normal (Gaussian)
distributions so common in
statistics?
1. Easy to calculate with
2. Common in nature
3. Very conservative assumption
0.0 0.1 0.2 0.3 0.4
x
density
−4σ −2σ 0 2σ 4σ
95%

49. -6 -3 0 3 6
0.00 0.10 0.20
position
Density
16 steps
-6 -3 0 3 6
0.0 0.1 0.2 0.3
position
Density
4 steps
-6 -3 0 3 6
0.00 0.10 0.20
position
Density
8 steps
0 4 8 12 16
-6 -3 0 3 6
step number
position
'ĶĴłĿĲ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
Figure 4.2

50. -6 -3 0 3 6
0.00 0.10 0.20
position
Density
16 steps
-6 -3 0 3 6
0.0 0.1 0.2 0.3
position
Density
4 steps
-6 -3 0 3 6
0.00 0.10 0.20
position
Density
8 steps
0 4 8 12 16
-6 -3 0 3 6
step number
position
'ĶĴłĿĲ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
Figure 4.2

51. -6 -3 0 3 6
0.00 0.10 0.20
position
Density
16 steps
-6 -3 0 3 6
0.0 0.1 0.2 0.3
position
Density
4 steps
-6 -3 0 3 6
0.00 0.10 0.20
position
Density
8 steps
0 4 8 12 16
-6 -3 0 3 6
step number
position
'ĶĴłĿĲ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
Figure 4.2

52. Why normal?
• Processes that produce
normal distributions
• Products of small deviations
• Logarithms of products
Francis Galton’s 1894 “bean machine”
for simulating normal distributions

53. Why normal?
• Ontological perspective
• Processes which add fluctuations result
in dampening
• Damped fluctuations end up Gaussian
• No information left, except mean and
variance
• Can’t infer process from distribution!
• Epistemological perspective
• Know only mean and variance
• Then least surprising and most
conservative (maximum entropy)
distribution is Gaussian
• Nature likes maximum entropy
distributions

54. Why normal?
• Ontological perspective
• Processes which add fluctuations result
in dampening
• Damped fluctuations end up Gaussian
• No information left, except mean and
variance
• Can’t infer process from distribution!
• Epistemological perspective
• Know only mean and variance
• Then least surprising and most
conservative (maximum entropy)
distribution is Gaussian
• Nature likes maximum entropy
distributions

55. Linear models
• Models of normally distributed data
common
• “General Linear Model”: t-test, single
regression, multiple regression,
ANOVA, ANCOVA, MANOVA,
• All the same thing
• Learn strategy, not procedure
Willard Boepple

56. 30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 10
30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 20
30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 50
30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 100
30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 200
30 35 40 45 50 55 60
140 150 160 170 180
weight
height
N = 350

57. Regression as a
wicked oracle
• Regression automatically focuses
on the most informative cases
• Cases that don’t help are
automatically ignored
• But not kind — ask carefully

58. Why not just add everything?
• Could just add all available
predictors to model
• “We controlled for...”
• Almost always a bad idea
• Residual confounding
• Overfitting

59. AGE
HEIGHT MATH
?
MATH independent of HEIGHT, conditional on AGE

60. M
5 6 7 8 9 10
50 55 60 65 70 75 80
5 6 7 8 9 10
A
50 55 60 65 70 75 80
110 120 130
110 120 130
H

61. MATH independent of HEIGHT, conditional on AGE
118 119 120 121 122
55 60 65 70 75
H
M
A = 7
123 124 125 126 127
60 65 70 75
H
M
A = 8
128 129 130 131 132
60 65 70 75 80
H
M
A = 9
133 134 135 136 137
65 70 75 80
H
M
A = 10

62. LIGHT
POWER SWITCH
SWITCH dependent on POWER, conditional on LIGHT
SWITCH independent of POWER

63. LIGHT POWER SWITCH
ON ON ?
OFF ON ?
SWITCH dependent on POWER, conditional on LIGHT
This eﬀect known as “collider bias”

64. MARRIED
AGE HAPPY
HAPPY dependent on AGE, conditional on MARRIED
HAPPY independent of AGE

65. Why not just add everything?
• Matters for experiments as well
• Conditioning on post-treatment
• Conditioning on pre-treatment can
• Good news!
• Causal inference possible in
observational settings
• But requires good theory

66. Texts in Statistical Science
Richard McElreath
Statistical Rethinking
A Bayesian Course
with Examples in R and Stan
SECOND EDITION
ond
on
JUST COUNTING IMPLICATIONS OF ASSUMPTIONS