Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking Fall 2017 Lecture 02

Statistical Rethinking Fall 2017 Lecture 02

Chapters 2 and 3

Richard McElreath

October 27, 2017
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Garden of Forking Data OE  XIJUF UIFSF BSF 

    QBUIT UIBU TVSWJWF WF DPOTJEFSFE ĕWF EJČFSFOU DPOKFDUVSFT BCPVU UIF DPOUFOUT PG UIF CBH F NBSCMFT UP GPVS CMVF NBSCMFT 'PS FBDI PG UIFTF DPOKFDUVSFT XFWF TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUFOUJBMMZ EBUB  $POKFDUVSF 8BZT UP QSPEVDF < >  ×  ×  =  < >  ×  ×  =  < >  ×  ×  =  < >  ×  ×  =  < >  ×  ×  =  S PG XBZT UP QSPEVDF UIF EBUB GPS FBDI DPOKFDUVSF DBO CF DPNQVUFE VNCFS PG QBUIT JO FBDI iSJOHw PG UIF HBSEFO BOE UIFO CZ NVMUJQMZJOH ćJT JT KVTU B DPNQVUBUJPOBM EFWJDF *U UFMMT VT UIF TBNF UIJOH BT 'ĶĴ BWJOH UP ESBX UIF HBSEFO ćF GBDU UIBU OVNCFST BSF NVMUJQMJFE EVSJOH OHF UIF GBDU UIBU UIJT JT TUJMM KVTU DPVOUJOH PG MPHJDBMMZ QPTTJCMF QBUIT
  2. Updating Another draw from the bag: QBUIT DPNQBUJCMF XJUI UIF

    EBUB TFRVFODF  0S ZPV DPVME UBLF UIF Q PWFS DPOKFDUVSFT      BOE KVTU VQEBUF UIFN JO MJHIU PG UIF OFX PCTFS PVU UIBU UIFTF UXP NFUIPET BSF NBUIFNBUJDBMMZ JEFOUJDBM "T MPOH BT UIF OFX JT MPHJDBMMZ JOEFQFOEFOU PG UIF QSFWJPVT PCTFSWBUJPOT SFT IPX UP EP JU 'JSTU XF DPVOU UIF OVNCFST PG XBZT FBDI DPOKFDUVSF DPVME Q X PCTFSWBUJPO  ćFO XF NVMUJQMZ FBDI PG UIFTF OFX DPVOUT CZ UIF QSFWJPVT OV GPS FBDI DPOKFDUVSF *O UBCMF GPSN $POKFDUVSF 8BZT UP QSPEVDF 1SFWJPVT DPVOUT /FX DPVOU < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  X DPVOUT JO UIF SJHIUIBOE DPMVNO BCPWF TVNNBSJ[F BMM UIF FWJEFODF GPS FBDI T OFX EBUB BSSJWF BOE QSPWJEFE UIPTF EBUB BSF JOEFQFOEFOU PG QSFWJPVT PCTFSW F OVNCFS PG MPHJDBMMZ QPTTJCMF XBZT GPS B DPOKFDUVSF UP QSPEVDF BMM UIF EBUB VQ BO CF DPNQVUFE KVTU CZ NVMUJQMZJOH UIF OFX DPVOU CZ UIF PME DPVOU T VQEBUJOH BQQSPBDI BNPVOUT UP OPUIJOH NPSF UIBO BTTFSUJOH UIBU  XIFO X VT JOGPSNBUJPO TVHHFTUJOH UIFSF BSF 8QSJPS XBZT GPS B DPOKFDUVSF UP QSPEVDF B Q 4
  3. Using other information marbles rare, but every bag contains at

    least one. Factory says: IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF H DPOUBJOJOH < > UIFZ NBEF  CBHT DPOUBJOJOH < > BOE  CBHT D > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO 8F DBO VQEBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > * E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBXO GSPN FSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQQPTF GPS POF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF 4P GPS H < > UIFZ NBEF  CBHT DPOUBJOJOH < > BOE  CBHT DPOUBJO Z BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE POF XIJUF EBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU >    ×  =  >    ×  =  >    ×  =  >    ×  =  >    ×  =  < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *T UIFSF B F JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPOKFDUVSFT :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO BM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSFWJPVT JO
  4. Using other information marbles rare. Factory says: IJT FYBNQMF UIF

    QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF H DPOUBJOJOH < > UIFZ NBEF  CBHT DPOUBJOJOH < > BOE  CBHT D > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO 8F DBO VQEBUF PVS DPVOUT BHBJO $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  < >    ×  =  DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > * E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF
  5. Counts to plausibility Unglamorous basis of applied probability: Things that

    can happen more ways are more plausible. J[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT QMBVTJCJMJUZ PG Q BęFS %OFX = XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS  FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < >    < > .   < > .   < > .   < >    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ
  6. Counts to plausibility TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ

    BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS  FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < >    < > .   < > .   < > .   < >    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ dz.0(ǭ24.Ǯ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ UIF QMBVTJCJMJUJFT GPS BMM QPTTJCMF DPOKFDUVSFT XJMM CF POF "MM ZPV OFFE UP EP JO PSEFS UP TUBO EBSEJ[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q "OE UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT QMBVTJCJMJUZ PG Q BęFS %OFX = XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q TVN PG QSPEVDUT ćFSFT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH UIF OVNCFS  FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQMBV TJCJMJUZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < >    < > .   < > .   < > .   < >    :PV DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 3 DPEF  24. ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ 24.dz.0(ǭ24.Ǯ ǯƼǰ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF SFBM OVNCFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJMJUJFT ZPV DBO BMTP EP XJUI UIFTF WBMVFT 4QFDJĕDBMMZ FBDI QJFDF PG UIF DBMDVMBUJPO IBT B EJSFDU QBSUOFS JO BQQMJFE QSPCBCJMJUZ UIFPSZ ćFTF QBSUOFST IBWF TUFSFPUZQFE OBNFT TP JUT XPSUI
  7. Counts to plausibility TVN PG QSPEVDUT FT OPUIJOH TQFDJBM SFBMMZ

    BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH CFS  FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ UZw 1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ < >    < > .   < > .   < > .   < >    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3 ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ dz.0(ǭ24.Ǯ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ Plausibility is probability: Set of non-negative real numbers that sum to one. Probability theory is just a set of shortcuts for counting possibilities.
  8. Building a model • How to use probability to do

    typical statistical modeling? 1. Design the model (data story) 2. Condition on the data (update) 3. Evaluate the model (critique)
  9. Design > Condition > Evaluate • Data story motivates the

    model • How do the data arise? • For WLWWWLWLW: • Some true proportion of water, p • Toss globe, probability p of observing W, 1–p of L • Each toss therefore independent of other tosses • Translate data story into probability statements
  10. Design > Condition > Evaluate • Bayesian updating defines optimal

    learning in small world, converts prior into posterior • Give your golem an information state, before the data: Here, an initial confidence in each possible value of p between zero and one • Condition on data to update information state: New confidence in each value of p, conditional on data
  11. probability of water 0 0.5 1 n = 1 W

    L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W W n = 4 W L W W W L W L W confidence n = 5 W L W W W L W L W W prior p, proportion W plausibility
  12. probability of water 0 0.5 1 n = 1 W

    L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W W n = 4 W L W W W L W L W confidence n = 5 W L W W W L W L W W prior posterior p, proportion W plausibility
  13. probability of water 0 0.5 1 n = 1 W

    L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W probability of water 0 0.5 1 n = 3 W L W W W L W L W probability of water 0 0.5 1 n = 4 W L W W W L W L W confidence probability of water 0 0.5 1 n = 5 W L W W W L W L W probability of water 0 0.5 1 n = 6 W L W W W L W L W probability of water 0 0.5 1 n = 7 W L W W W L W L W confidence probability of water 0 0.5 1 n = 8 W L W W W L W L W probability of water 0 0.5 1 n = 9 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W
  14. Design > Condition > Evaluate • Data order irrelevant, because

    golem assumes order irrelevant • All-at-once, one-at-a-time, shuffled order all give same posterior • Every posterior is a prior for next observation • Every prior is posterior of some other inference   4."-- 803-%4 "/% -"3(& 803-%4 probability of water 0 0.5 1 n = 1 W L W W W L W L W confidence probability of water 0 0.5 1 n = 2 W L W W W L W L W probability of water 0 0.5 1 n = 3 W L W W W L W L W probability of water 0 0.5 1 n = 4 W L W W W L W L W confidence probability of water 0 0.5 1 n = 5 W L W W W L W L W probability of water 0 0.5 1 n = 6 W L W W W L W L W probability of water 0 0.5 1 n = 7 W L W W W L W L W confidence probability of water 0 0.5 1 n = 8 W L W W W L W L W probability of water 0 0.5 1 n = 9 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W proportion water 0 0.5 1 plausibility n = 0 W L W W W L W L W 'ĶĴłĿIJ Ɗƍ )PX B #BZFTJBO NPEFM MFBSOT &BDI UPTT PG UIF HMPCF QSPEVDFT BO PCTFSWBUJPO PG XBUFS 8 PS MBOE -  ćF NPEFMT FTUJNBUF PG UIF QSP QPSUJPO PG XBUFS PO UIF HMPCF JT B QMBVTJCJMJUZ GPS FWFSZ QPTTJCMF WBMVF ćF MJOFT BOE DVSWFT JO UIJT ĕHVSF BSF UIFTF DPMMFDUJPOT PG QMBVTJCJMJUJFT *O FBDI QMPU B QSFWJPVT QMBVTJCJMJUJFT EBTIFE DVSWF BSF VQEBUFE JO MJHIU PG UIF MBUFTU
  15. Design > Condition > Evaluate • Bayesian inference: Logical answer

    to a question in the form of a model “How plausible is each proportion of water, given these data?” • Golem must be supervised • Did the golem malfunction? • Does the golem’s answer make sense? • Does the question make sense? • Check sensitivity of answer to changes in assumptions
  16. Construction perspective • Build joint model: (1) List variables: data

    & parameters (2) Assign data distribution (likelihood) (3) Assign parameter distribution (prior) • Input: Joint prior • Deduce: Joint posterior
  17. Variables: Data & Parameters • Variables: • n: Number of

    globe tosses • nW : Number of water landings • p: proportion of water on globe • Some are data (nW , n) – can be observed • Others parameters (p) – cannot be observed • Define targets of inference, what is updated • These were the conjectures in the bag example • Which are data and which parameters depend upon your context and question • e.g. mark-recapture: know nW , must infer n, p
  18. Data model: Likelihood • Pr(data|assumptions) • Defines probability of each

    observation, conditional “|” on assumptions • i.e. relative count of number of ways of seeing data, given a particular conjecture • In this case, binomial probability:  $0.10/&/54 0' 5)& .0%&- *O UIJT DBTF PODF XF BEE PVS BTTVNQUJPOT UIBU  FWFSZ UPTT JT JOEFQF IF PUIFS UPTTFT BOE  UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT Z UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EFOTJUZ ćJ NNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(O8|O, Q) = O! O8!(O − O8)! QO8 ( − Q)O−O8 . E UIF BCPWF BT ćF DPVOU PG 8T O8 JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCBCJMJUZ Q PG B 8 PO FBDI UPTT BOE O UPTTFT JO UPUBM
  19. Data model: Likelihood *O UIJT DBTF PODF XF BEE PVS

    BTTVNQUJPOT UIBU  FWFSZ UPTT JT JOEFQF IF PUIFS UPTTFT BOE  UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT Z UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EFOTJUZ ćJ NNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(O8|O, Q) = O! O8!(O − O8)! QO8 ( − Q)O−O8 . E UIF BCPWF BT ćF DPVOU PG 8T O8 JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCBCJMJUZ Q PG B 8 PO FBDI UPTT BOE O UPTTFT JO UPUBM E UIF CJOPNJBM EFOTJUZ GPSNVMB JT CVJMU JOUP 3 TP ZPV DBO FBTJMZ DPNQV JIPPE PG UIF EBUB‰ 8T JO  UPTTFT‰VOEFS BOZ WBMVF PG Q XJUI )*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃƻǏǀ Ǯ ƻǏƼǁƿƻǁƽǀ count W number tosses probability W The count of W’s is distributed binomially, with probability p of a W on each toss and n tosses total.
  20. BSF OP PUIFS FWFOUT ćF HMPCF OFWFS HFUT TUVDL UP

    UIF DFJMJOH GPS FYBNQMF 8IFO XF PCTFSWF B TBNQMF PG 8T BOE -T PG MFOHUI /  JO UIF BDUVBM TBNQMF XF OFFE UP TBZ IPX MJLFMZ UIBU FYBDU TBNQMF JT PVU PG UIF VOJWFSTF PG QPUFOUJBM TBNQMFT PG UIF TBNF MFOHUI ćBU NJHIU TPVOE DIBMMFOHJOH CVU JUT UIF LJOE PG UIJOH ZPV HFU HPPE BU WFSZ RVJDLMZ PODF ZPV TUBSU QSBDUJDJOH *O UIJT DBTF PODF XF BEE PVS BTTVNQUJPOT UIBU  FWFSZ UPTT JT JOEFQFOEFOU PG UIF PUIFS UPTTFT BOE  UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT QSPCBCJMJUZ UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EJTUSJCVUJPO ćJT JT UIF DPNNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH X 8T JO O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(X|O, Q) = O! X!(O − X)! QX ( − Q)O−X . 3FBE UIF BCPWF BT ćF DPVOU PG iXBUFSw PCTFSWBUJPOT X JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCB CJMJUZ Q PG iXBUFSw PO FBDI UPTT BOE O UPTTFT JO UPUBM "OE UIF CJOPNJBM EJTUSJCVUJPO GPSNVMB JT CVJMU JOUP 3 TP ZPV DBO FBTJMZ DPNQVUF UIF MJLFMJ IPPE PG UIF EBUB‰ 8T JO  UPTTFT‰VOEFS BOZ WBMVF PG Q XJUI 3 DPEF  $)*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃƻǏǀ Ǯ ǯƼǰ ƻǏƼǁƿƻǁƽǀ ćBU OVNCFS JT UIF SFMBUJWF OVNCFS PG XBZT UP HFU  8T IPMEJOH Q BU  BOE O BU  4P JU EPFT UIF KPC PG DPVOUJOH SFMBUJWF OVNCFS PG QBUIT UISPVHI UIF HBSEFO $IBOHF UIF ƻǏǀ UP BOZ PUIFS WBMVF UP TFF IPX UIF WBMVF DIBOHFT 4PNFUJNFT MJLFMJIPPET BSF XSJUUFO -(Q|X, O) UIF MJLFMJIPPE PG Q DPOEJUJPOBM PO X BOE O /PUF IPXFWFS UIBU UIJT OPUBUJPO SFWFSTFT XIBU JT PO UIF MFę TJEF PG UIF | TZNCPM +VTU LFFQ JO NJOE UIBU UIF KPC PG UIF MJLFMJIPPE JT UP UFMM VT UIF SFMBUJWF OVNCFS PG XBZT UP TFF UIF EBUB X HJWFO WBMVFT GPS Q BOE O Data model: Likelihood *O UIJT DBTF PODF XF BEE PVS BTTVNQUJPOT UIBU  FWFSZ UPTT JT JOEFQF IF PUIFS UPTTFT BOE  UIF QSPCBCJMJUZ PG 8 JT UIF TBNF PO FWFSZ UPTT Z UIFPSZ QSPWJEFT B VOJRVF BOTXFS LOPXO BT UIF CJOPNJBM EFOTJUZ ćJ NNPO iDPJO UPTTJOHw EJTUSJCVUJPO "OE TP UIF QSPCBCJMJUZ PG PCTFSWJOH O UPTTFT XJUI B QSPCBCJMJUZ Q PG 8 JT 1S(O8|O, Q) = O! O8!(O − O8)! QO8 ( − Q)O−O8 . E UIF BCPWF BT ćF DPVOU PG 8T O8 JT EJTUSJCVUFE CJOPNJBMMZ XJUI QSPCBCJMJUZ Q PG B 8 PO FBDI UPTT BOE O UPTTFT JO UPUBM E UIF CJOPNJBM EFOTJUZ GPSNVMB JT CVJMU JOUP 3 TP ZPV DBO FBTJMZ DPNQV JIPPE PG UIF EBUB‰ 8T JO  UPTTFT‰VOEFS BOZ WBMVF PG Q XJUI )*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃƻǏǀ Ǯ ƻǏƼǁƿƻǁƽǀ
  21. Parameter model: Prior • What the golem believes before the

    data • Likelihood & prior define prior predictive distribution • More on this later – it helps us build priors that make sense   4."-- 8 probability of water 0 0.5 1 n = 1 W L W W W L W L W confidence n = 4 W L W W W L W L W dence prior p, proportion W plausibility
  22. Parameter model: Prior • Globe tossing model, a uniform (flat)

    prior:  $0.10/&/54 0' 5)& .0%&- PFTOU SFTPMWF UIF QSPCMFN PG QSPWJEJOH B QSJPS CFDBVTF BU UIF EBXO PG O =  UIF NBDIJOF TUJMM IBE BO JOJUJBM FTUJNBUF GPS UIF QBSBNFUFS Q QFDJGZJOH FRVBM DPOĕEFODF JO FWFSZ QPTTJCMF WBMVF :PV DPVME XSJUF UIF U FYBNQMF BT 1S(Q) =   −  = . SJPS JT B QSPCBCJMJUZ EJTUSJCVUJPO GPS UIF QBSBNFUFS *O HFOFSBM GPS B VOJ GSPN B UP C UIF QSPCBCJMJUZ PG BOZ QPJOU JO UIF JOUFSWBM JT /(C−B) *G Z FE CZ UIF GBDU UIBU UIF QSPCBCJMJUZ PG FWFSZ WBMVF PG Q JT  IBOH PO UP H :PVMM HFU UIF FYQMBOBUJPO JO B MBUFS DIBQUFS #VU BMTP LOPX UIBU U DUMZ OPSNBM BOE IFBMUIZ ćF QSJPS JT XIBU UIF NBDIJOF iCFMJFWFTw CFGPSF JU TFFT UIF EBUB *U JT QB PEFM OPU B SFĘFDUJPO OFDFTTBSJMZ PG XIBU ZPV CFMJFWF $MFBSMZ UIF MJLFMJ JOT NBOZ BTTVNQUJPOT UIBU BSF VOMJLFMZ UP CF FYBDUMZ USVF‰DPNQMFUFMZ The prior distribution of p is assumed to be uniform in the interval from zero to one.
  23. Prior literature • Huge literature on choice of prior •

    Flat prior conventional, but hardly ever best choice • Always know something (before data) that can improve inference • Are zero and one plausible values for p? Is p < 0.5 as plausible as p > 0.5? • There is no “true” prior • Just need to do better • All above equally true of likelihood Late Cretaceous (90Mya)
  24. Posterior • Bayesian estimate is always posterior distribution over parameters,

    Pr(parameters|data) • Here: Pr(p|nW ) • Compute using Bayes’ theorem: Q) BOE UIF QSJPS QSPCBCJMJUZ 1S(Q) ćJT JT MJLF TBZJOH UIBU UIF QSPCBCJ E XJOE PO UIF TBNF EBZ JT FRVBM UP UIF QSPCBCJMJUZ PG SBJO XIFO JUT X IF QSPCBCJMJUZ PG XJOE ćJT NVDI JT KVTU EFĕOJUJPO #VU JUT KVTU B 1S(O8, Q) = 1S(Q|O8) 1S(O8). EPOF JT SFWFSTF XIJDI QSPCBCJMJUZ JT DPOEJUJPOBM PO UIF SJHIUIBOE M B USVF EFĕOJUJPO /PX TJODF CPUI SJHIUIBOE TJEFT BSF FRVBM UP UIF XF DBO TFU UIFN FRVBM UP POF BOPUIFS BOE TPMWF GPS UIF QPTUFSJPS QSPCB 8) 1S(Q|O8) = 1S(O8|Q) 1S(Q) 1S(O8) . JT JT #BZFT UIFPSFN *U TBZT UIBU UIF QSPCBCJMJUZ PG BOZ QBSUJDVMBS WBMV FSJOH UIF EBUB JT FRVBM UP UIF QSPEVDU PG UIF MJLFMJIPPE BOE QSJPS EJ  .",*/( 5)& .0%&- (0 IJT UIJOH 1S(O8) XIJDI *MM DBMM UIF BWFSBHF MJLFMJIPPE *O XPSE GPSN 1PTUFSJPS = -JLFMJIPPE × 1SJPS "WFSBHF -JLFMJIPPE. ćF BWFSBHF MJLFMJIPPE 1S(O8) DBO CF DPOGVTJOH *U JT DPNNPOMZ DBMMF EFODFw PS UIF iQSPCBCJMJUZ PG UIF EBUB w OFJUIFS PG XIJDI JT B HPPE OBNF
  25. posterior 0 0.5 1 likelihood 0 0.5 1 prior 0

    0.5 1 ⇥ / posterior 0 0.5 1 prior 0 0.5 1 ⇥ / ⇥ / likelihood 0 0.5 1 prior 0 0.5 1 likelihood 0 0.5 1 posterior 0 0.5 1
  26. Computing the posterior 1. Analytical approach (often impossible) 2. Grid

    approximation (very intensive) 3. Quadratic approximation (approximate) 4. Markov chain Monte Carlo (intensive)
  27. Grid approximation • The posterior is: standardized product of the

    likelihood and prior. • Grid approximation uses finite grid of parameter values instead of continuous space
  28. Quadratic approximation • Assume posterior is normally distributed • Can

    estimate with two numbers: • Peak of posterior, maximum a posteriori (MAP) • Standard deviation of posterior • Lots of algorithms • With flat priors, same as conventional maximum likelihood estimation
  29. Sampling from the posterior • Incredibly useful to sample randomly

    from the posterior • Visualize uncertainty • Compute confidence intervals • Simulate observations • MCMC produces only samples • Above all, easier to think with samples
  30. Sampling from the posterior • Recipe: 1. Compute or estimate

    posterior 2. Sample with replacement from posterior 3. Compute stuff from samples
  31. Compute posterior • Grid approximation 0 5000 10000 sample 0.0

    0.5 1.0 probability of water 'ĶĴłĿIJ ƋƉ 4BNQMJOH QBSBNFUFS WBMVFT GSPN UIF QPTUFSJPS EJT USJCVUJPO -Fę UIPVTBOE TBNQMFT GSPN UIF QPTUFSJPS JNQMJFE CZ UIF HMPCF UPTTJOH EBUB BOE NPEFM 3JHIU ćF EFOTJUZ PG TBN QMFT WFSUJDBM BU FBDI QBSBNFUFS WBMVF IPSJ[POUBM  3 DPEF  + ʄǤ . ,ǭ !-*(ʃƻ ǐ /*ʃƼ ǐ ' )"/#Ǐ*0/ʃƼƻƻƻ Ǯ +-$*- ʄǤ - +ǭ Ƽ ǐ Ƽƻƻƻ Ǯ '$& '$#** ʄǤ $)*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃ+ Ǯ +*./ -$*- ʄǤ '$& '$#** Ƿ +-$*- +*./ -$*- ʄǤ +*./ -$*- dz .0(ǭ+*./ -$*-Ǯ /PX XF XJTI UP ESBX UIPVTBOE TBNQMFT GSPN UIJT QPTUFSJPS *NBHJOF UIF QPT UFSJPS JT B CVDLFU GVMM PG QBSBNFUFS WBMVFT OVNCFST TVDI BT     FUD 8JUIJO UIF CVDLFU FBDI WBMVF FYJTUT JO QSPQPSUJPO UP JUT QPTUFSJPS QSPCBCJMJUZ TVDI UIBU WBMVFT OFBS UIF QFBL BSF NVDI NPSF DPNNPO UIBO UIPTF JO UIF UBJMT 8FSF HPJOH UP TDPPQ PVU UIPVTBOE WBMVFT GSPN UIF CVDLFU 1SPWJEFE UIF CVDLFU JT XFMM NJYFE UIF SFTVMUJOH TBNQMFT XJMM IBWF UIF TBNF QSPQPSUJPOT BT UIF FYBDU QPT UFSJPS EFOTJUZ )FSFT IPX ZPV DBO EP UIJT JO 3 XJUI POF MJOF PG DPEF 3 DPEF
  32. 3 DPEF  + ʄǤ . ,ǭ !-*(ʃƻ ǐ /*ʃƼ

    ǐ ' )"/#Ǐ*0/ʃƼƻƻƻ Ǯ +-$*- ʄǤ - +ǭ Ƽ ǐ Ƽƻƻƻ Ǯ '$& '$#** ʄǤ $)*(ǭ ǁ ǐ .$5 ʃDŽ ǐ +-*ʃ+ Ǯ +*./ -$*- ʄǤ '$& '$#** Ƿ +-$*- +*./ -$*- ʄǤ +*./ -$*- dz .0(ǭ+*./ -$*-Ǯ /PX XF XJTI UP ESBX UIPVTBOE TBNQMFT GSPN UI UFSJPS JT B CVDLFU GVMM PG QBSBNFUFS WBMVFT OVNCFS 8JUIJO UIF CVDLFU FBDI WBMVF FYJTUT JO QSPQPSUJPO UP UIBU WBMVFT OFBS UIF QFBL BSF NVDI NPSF DPNNPO HPJOH UP TDPPQ PVU UIPVTBOE WBMVFT GSPN UIF CV XFMM NJYFE UIF SFTVMUJOH TBNQMFT XJMM IBWF UIF TBNF UFSJPS EFOTJUZ )FSFT IPX ZPV DBO EP UIJT JO 3 XJUI POF MJOF P 3 DPEF  .(+' . ʄǤ .(+' ǭ (* '. ǐ .$5 ʃƼ ƿ ǐ - +' ćF XPSLIPSTF IFSF JT .(+' XIJDI SBOEPNMZ QV WFDUPS JO UIJT DBTF JT (* '. UIF HSJE PG QBSBNFUFS ćF SFTVMUJOH TBNQMFT BSF EJTQMBZFE JO 'ĶĴłĿIJ Ƌ Ƽ ƿ SBOEPN TBNQMFT BSF TIPXO TFRVFOUJBMMZ 3 DPEF  +'*/ǭ .(+' . Ǯ 0 200 600 1000 0.0 0.4 0.8 Index p 0 200 600 1000 0.6 0.8 1.0 1.2 1.4 Index prior 0 200 600 1000 0.00 0.10 0.20 Index likelihood 0 200 600 1000 0.0000 0.0015 Index posterior
  33. Sample from posterior UFSJPS JT B CVDLFU GVMM PG QBSBNFUFS

    WBMVFT OVNCFST TVDI BT     FUD 8JUIJO UIF CVDLFU FBDI WBMVF FYJTUT JO QSPQPSUJPO UP JUT QPTUFSJPS QSPCBCJMJUZ TVDI UIBU WBMVFT OFBS UIF QFBL BSF NVDI NPSF DPNNPO UIBO UIPTF JO UIF UBJMT 8FSF HPJOH UP TDPPQ PVU UIPVTBOE WBMVFT GSPN UIF CVDLFU 1SPWJEFE UIF CVDLFU JT XFMM NJYFE UIF SFTVMUJOH TBNQMFT XJMM IBWF UIF TBNF QSPQPSUJPOT BT UIF FYBDU QPT UFSJPS EFOTJUZ )FSFT IPX ZPV DBO EP UIJT JO 3 XJUI POF MJOF PG DPEF 3 DPEF  .(+' . ʄǤ .(+' ǭ + ǐ +-*ʃ+*./ -$*- ǐ .$5 ʃƼ ƿ ǐ - +' ʃ Ǯ ćF XPSLIPSTF IFSF JT .(+' XIJDI SBOEPNMZ QVMMT WBMVFT GSPN B WFDUPS ćF WFDUPS JO UIJT DBTF JT (* '. UIF HSJE PG QBSBNFUFS WBMVFT ćF SFTVMUJOH TBNQMFT BSF EJTQMBZFE JO 'ĶĴłĿIJ ƋƉ 0O UIF MFę BMM UIPVTBOE Ƽ ƿ SBOEPN TBNQMFT BSF TIPXO TFRVFOUJBMMZ 3 DPEF  +'*/ǭ .(+' . Ǯ Figure 3.1  4".1-*/( 50 46.."3*;&  0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 proportion water (p) Density 'ĶĴłĿIJ ƋƉ 4BNQMJOH QBSBNFUFS WBMVFT GSPN UIF QPTUFSJPS EJTUSJCVUJPO -Fę UIPVTBOE TBNQMFT GSPN UIF QPTUFSJPS JNQMJFE CZ UIF HMPCF UPTTJOH EBUB BOE NPEFM 3JHIU ćF EFOTJUZ PG TBNQMFT WFSUJDBM BU FBDI QBSBNFUFS
  34. Compute stuff • Summary tasks • How much posterior probability

    below/above/between specified parameter values? • Which parameter values contain 50%/80%/95% of posterior probability? “Confidence” intervals • Which parameter value maximizes posterior probability? Minimizes posterior loss? Point estimates • You decide the question
  35. Homework • Practice problems at end of Chapter 3 •

    For certificate: Write up solutions to HARD problems: 3H1, 3H2, 3H3, 3H4, 3H5 Turn them in to me next Friday (3 Nov) • Next week: Rethinking regression (Chapter 4)