model • How do the data arise? • For WLWWWLWLW: • Some true proportion of water, p • Toss globe, probability p of observing W, 1–p of L • Each toss therefore independent of other tosses • Translate data story into probability statements
learning in small world, converts prior into posterior • Give your golem an information state, before the data: Here, an initial confidence in each possible value of p between zero and one • Condition on data to update information state: New confidence in each value of p, conditional on data
L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W W n = 4 W L W W W L W L W confidence n = 5 W L W W W L W L W W prior p, proportion W plausibility
L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W W n = 4 W L W W W L W L W confidence n = 5 W L W W W L W L W W prior posterior p, proportion W plausibility
L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W probability of water 0 0.5 1 n = 3 W L W W W L W L W probability of water 0 0.5 1 n = 4 W L W W W L W L W confidence probability of water 0 0.5 1 n = 5 W L W W W L W L W probability of water 0 0.5 1 n = 6 W L W W W L W L W probability of water 0 0.5 1 n = 7 W L W W W L W L W confidence probability of water 0 0.5 1 n = 8 W L W W W L W L W probability of water 0 0.5 1 n = 9 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W
golem assumes order irrelevant • All-at-once, one-at-a-time, shuffled order all give same posterior • Every posterior is a prior for next observation • Every prior is posterior of some other inference 4."-- 803-%4 "/% -"3(& 803-%4 probability of water 0 0.5 1 n = 1 W L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W probability of water 0 0.5 1 n = 3 W L W W W L W L W probability of water 0 0.5 1 n = 4 W L W W W L W L W confidence probability of water 0 0.5 1 n = 5 W L W W W L W L W probability of water 0 0.5 1 n = 6 W L W W W L W L W probability of water 0 0.5 1 n = 7 W L W W W L W L W confidence probability of water 0 0.5 1 n = 8 W L W W W L W L W probability of water 0 0.5 1 n = 9 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W 'ĶĴłĿIJ Ɗƍ )PX B #BZFTJBO NPEFM MFBSOT &BDI UPTT PG UIF HMPCF QSPEVDFT BO PCTFSWBUJPO PG XBUFS 8 PS MBOE - ćF NPEFMT FTUJNBUF PG UIF QSP QPSUJPO PG XBUFS PO UIF HMPCF JT B QMBVTJCJMJUZ GPS FWFSZ QPTTJCMF WBMVF ćF MJOFT BOE DVSWFT JO UIJT ĕHVSF BSF UIFTF DPMMFDUJPOT PG QMBVTJCJMJUJFT *O FBDI QMPU B QSFWJPVT QMBVTJCJMJUJFT EBTIFE DVSWF BSF VQEBUFE JO MJHIU PG UIF MBUFTU
to a question in the form of a model • Golem must be supervised • Did the golem malfunction? • Does the golem’s answer make sense? • Does the question make sense? • Check sensitivity of answer to changes in assumptions
“|” on assumptions • i.e. relative count of number of ways of seeing data, given a particular conjecture • In this case, binomial probability: $0.10/&/54 0' 5)& .0%&- *O UIJT DBTF PODF XF BEE PVS BTTVNQUJPOT UIBU FWFSZ UPTT JT JOEFQF IF PUIFS UPTTFT BOE UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT Z UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EFOTJUZ ćJ NNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(O8|O, Q) = O! O8!(O − O8)! QO8 ( − Q)O−O8 . E UIF BCPWF BT ćF DPVOU PG 8T O8 JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCBCJMJUZ Q PG B 8 PO FBDI UPTT BOE O UPTTFT JO UPUBM
FWFSZ UPTT JT JOEFQF IF PUIFS UPTTFT BOE UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT Z UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EFOTJUZ ćJ NNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(O8|O, Q) = O! O8!(O − O8)! QO8 ( − Q)O−O8 . E UIF BCPWF BT ćF DPVOU PG 8T O8 JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCBCJMJUZ Q PG B 8 PO FBDI UPTT BOE O UPTTFT JO UPUBM E UIF CJOPNJBM EFOTJUZ GPSNVMB JT CVJMU JOUP 3 TP ZPV DBO FBTJMZ DPNQV JIPPE PG UIF EBUB 8T JO UPTTFTVOEFS BOZ WBMVF PG Q XJUI )*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃƻǏǀ Ǯ ƻǏƼǁƿƻǁƽǀ count W number tosses probability W The count of W’s is distributed binomially, with probability p of a W on each toss and n tosses total.
Some are data (nW , n) • Others parameters (p) • Define targets of inference, what is updated • These were the conjectures in the bag example • Which are data and which parameters depend upon your context and question • e.g. mark-recapture: know nW , must infer n, p
Probability of p • Likelihood & prior define golem’s perspective on the data 4."-- 8 probability of water 0 0.5 1 n = 1 W L W W W L W L W confidence n = 4 W L W W W L W L W dence prior p, proportion W plausibility
$0.10/&/54 0' 5)& .0%&- PFTOU SFTPMWF UIF QSPCMFN PG QSPWJEJOH B QSJPS CFDBVTF BU UIF EBXO PG O = UIF NBDIJOF TUJMM IBE BO JOJUJBM FTUJNBUF GPS UIF QBSBNFUFS Q QFDJGZJOH FRVBM DPOĕEFODF JO FWFSZ QPTTJCMF WBMVF :PV DPVME XSJUF UIF U FYBNQMF BT 1S(Q) = − = . SJPS JT B QSPCBCJMJUZ EJTUSJCVUJPO GPS UIF QBSBNFUFS *O HFOFSBM GPS B VOJ GSPN B UP C UIF QSPCBCJMJUZ PG BOZ QPJOU JO UIF JOUFSWBM JT /(C−B) *G Z FE CZ UIF GBDU UIBU UIF QSPCBCJMJUZ PG FWFSZ WBMVF PG Q JT IBOH PO UP H :PVMM HFU UIF FYQMBOBUJPO JO B MBUFS DIBQUFS #VU BMTP LOPX UIBU U DUMZ OPSNBM BOE IFBMUIZ ćF QSJPS JT XIBU UIF NBDIJOF iCFMJFWFTw CFGPSF JU TFFT UIF EBUB *U JT QB PEFM OPU B SFĘFDUJPO OFDFTTBSJMZ PG XIBU ZPV CFMJFWF $MFBSMZ UIF MJLFMJ JOT NBOZ BTTVNQUJPOT UIBU BSF VOMJLFMZ UP CF FYBDUMZ USVFDPNQMFUFMZ The prior probability of p is assumed to be uniform in the interval from zero to one.
prior conventional, but hardly ever best choice • Always know something (before data) that can improve inference • Are zero and one plausible values for p? Is p < 0.5 as plausible as p > 0.5? • Don’t have to get it exactly right; just need to improve Late Cretaceous (90Mya)
estimate with two numbers: • Peak of posterior, maximum a posteriori (MAP) • Standard deviation of posterior • Lots of algorithms • With flat priors, same as conventional maximum likelihood estimation
from the posterior • Visualize uncertainty • Compute confidence intervals • Simulate observations • MCMC produces only samples • Above all, easier to think with samples
below/above/between specified parameter values? • Which parameter values contain 50%/80%/95% of posterior probability? “Confidence” intervals • Which parameter value maximizes posterior probability? Minimizes posterior loss? Point estimates • You decide the question
best model might make terrible predictions • Also want to check model assumptions • Predictive checks: Can use samples from posterior to simulate observations • NB: Assumption about sampling is assumption
6 9 0.89 0 1000 3000 number of water samples Frequency 0 3 6 9 0.38 0 1000 3000 number of water samples Frequency 0 3 6 9 0.64 0 1000 3000 number of water samples Frequency 0 3 6 9 0.000 0.010 0.020 probability of water probability 0 0.5 1 (A) p = 0.38 (B) p = 0.64 (C) p = 0.89 A B C Merged Figure 3.4 0 1000 3000 Frequency 0 3 6 9 0.89 0 1000 3000 number of water samples Frequency 0 3 6 9 0 1000 3000 number of water samples Frequency 0 3 6 9 0.64 0 1000 3000 number of water samples Frequency 0 3 6 9 0.000 0.010 0.020 probability of water probability 0 0.5 1 (B) p = 0.64 (C) p = 0.89 A B C Merged
• No universally best way to evaluate adequacy of model-based predictions • No way to justify always using a threshold like 5% • Good predictive checks always depend upon purpose and imagination “It would be very nice to have a formal apparatus that gives us some ‘optimal’ way of recognizing unusual phenomena and inventing new classes of hypotheses [...]; but this remains an art for the creative human mind.” —E.T. Jaynes (1922–1998)