Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking Fall 2017 Lecture 10

Statistical Rethinking Fall 2017 Lecture 10

Week 6, Lecture 10, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. This lecture covers Chapter 8 of the book.

Movies embedded in this blog post: http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/

Richard McElreath

November 29, 2017
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Goals this week • Introduce Markov chain Monte Carlo •

    Understand the approach • Meet different algorithms • Meet interface to MCMC, map2stan • How to check chains • How to fix common problems • Meet maximum entropy & GLMs
  2. Contract: King Markov must visit each island in proportion to

    its population size. Here’s how he does it...
  3. (1) Flip a coin to choose island on left or

    right. Call it the “proposal” island. 1 2 1 2
  4. (2) Find population of proposal island. (1) Flip a coin

    to choose island on left or right. Call it the “proposal” island. proposal
  5. (2) Find population of proposal island. 1 2 3 4

    5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. proposal
  6. (2) Find population of proposal island. 1 2 3 4

    5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. proposal
  7. (2) Find population of proposal island. 1 2 3 4

    5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. p4 proposal
  8. (2) Find population of proposal island. 1 2 3 4

    5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. p4 (4) Move to proposal, with probability = p5 p4 proposal
  9. (2) Find population of proposal island. 1 2 3 4

    5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. (4) Move to proposal, with probability = p5 p4 (5) Repeat from (1)
  10. (2) Find population of proposal island. 1 2 3 4

    5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. (4) Move to proposal, with probability = p5 p4 (5) Repeat from (1) This procedure ensures visiting each island in proportion to its population, in the long run.
  11. Metropolis algorithm XFFLT PUIFS UJNFT CPVODJOH BSPVOE XJUIPVU BQQBSFOU QBUUFSO

    #VU JO UIF MPOH SVO UIJT QSPDFEVSF HVBSBOUFFT UIBU UIF LJOH XJMM CF GPVOE PO FBDI JTMBOE JO QSPQPSUJPO UP JUT QPQ VMBUJPO TJ[F :PV DBO QSPWF UIJT UP ZPVSTFMG CZ TJNVMBUJOH ,JOH .BSLPWT KPVSOFZ )FSFT B TIPSU QJFDF PG DPEF UP EP UIJT TUPSJOH UIF IJTUPSZ PG UIF LJOHT JTMBOE QPTJUJPOT JO UIF WFDUPS -,0&1&,+0 3 DPEF  +2*Ǒ4""(0 ʆǦ ƾ"ǁ -,0&1&,+0 ʆǦ /"-ǯƽǒ+2*Ǒ4""(0ǰ 2//"+1 ʆǦ ƾƽ #,/ ǯ & &+ ƾǓ+2*Ǒ4""(0 ǰ dz ȅ /" ,/! 2//"+1 -,0&1&,+ -,0&1&,+0DZ&Dz ʆǦ 2//"+1 ȅ #)&- ,&+ 1, $"+"/1" -/,-,0) -/,-,0) ʆǦ 2//"+1 ʀ 0*-)"ǯ ǯǦƾǒƾǰ ǒ 0&7"ʅƾ ǰ ȅ +,4 *(" 02/" %" ),,-0 /,2+! 1%" / %&-")$, &# ǯ -/,-,0) ʆ ƾ ǰ -/,-,0) ʆǦ ƾƽ &# ǯ -/,-,0) ʇ ƾƽ ǰ -/,-,0) ʆǦ ƾ ȅ *,3"ǘ -/,Ǒ*,3" ʆǦ -/,-,0)ǵ 2//"+1 2//"+1 ʆǦ &#")0"ǯ /2+&#ǯƾǰʆ-/,Ǒ*,3" ǒ -/,-,0) ǒ 2//"+1 ǰ Ǵ
  12. 2 4 6 8 10 0 5 10 15 20

    25 island number of weeks after 100 weeks 2 4 6 8 10 0 20 40 60 80 100 island number of weeks after 500 weeks 2 4 6 8 10 0 100 200 300 400 island number of weeks after 2000 weeks 0 2000 4000 6000 8000 10000 2 4 6 8 10 week island
  13. Markov’s chain of visits • Converges to correct proportions, in

    the long run • No matter which island starts • As long as proposals are symmetric • Example of Metropolis algorithm 2 4 6 8 10 0 500 1000 1500 island number of weeks after 10000 weeks
  14. Metropolis and MCMC • Usual use is to draw samples

    from a posterior distribution • “Islands”: parameter values • “Population size”: proportional to posterior probability • Works for any number of dimensions (parameters) • Works for continuous as well as discrete parameters
  15. Metropolis and MCMC • Metropolis: Simple version of Markov chain

    Monte Carlo (MCMC) • Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953) into radicals (by collision, or possibly radiation, as in aromatic hydrocarbons). (b) The molecules dissociate, but the resulting radi- cals recombine without escaping from the liquid cage. (c) The molecules dissociate and escape from the cage. In this case we would not expect them to move more than a few molecular diameters through the dense medium before being thermalized. In accordance with the notation introduced by Burton, Magee, and Samuel,22 the molecules following 22 Burton, Magee, and Samuel, J. Chern. Phys. 20, 760 (1952). THE JOURNAL OF CHEMICAL PHYSICS tions that the ionized H2 0 molecules will become the H2 0t molecules, but this is not likely to be a complete correspondence. In conclusion we would like to emphasize that the qualitative result of this section is not critically de- pendent on the exact values of the physical parameters used. However, this treatment is classical, and a correct treatment must be wave mechanical; therefore the result of this section cannot be taken as an a priori theoretical prediction. The success of the radical diffu- sion model given above lends some plausibility to the occurrence of electron capture as described by this crude calculation. Further work is clearly needed. VOLUME 21, NUMBER 6 JUNE, 1953 Equation of State Calculations by Fast Computing Machines NICHOLAS METROPOLIS, ARIANNA W. ROSENBLUTH, MARSHALL N. ROSENBLUTH, AND AUGUSTA H. TELLER, Los Alamos Scientific Laboratory, Los Alamos, New Mexico AND EDWARD TELLER, * Department of Physics, University of Chicago, Chicago, Illinois (Received March 6, 1953) A general method, suitable for fast computing machines, for investigatiflg such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two-dimensional rigid-sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four-term virial coefficient expansion.
  16. MANIAC: Mathematical Analyzer, Numerical Integrator, and Computer MANIAC: 1000 pounds

    5 kilobytes of memory 70k multiplications/sec Your laptop: 4–7 pounds 2–8 million kilobytes Billions of multiplications/sec
  17. Metropolis and MCMC • Metropolis: Simple version of Markov chain

    Monte Carlo (MCMC) • Chain: Sequence of draws from distribution • Markov chain: History doesn’t matter, just where you are now • Monte Carlo: Random simulation Andrei Andreyevich Markov (Ма́рков) (1856–1922)
  18. Why MCMC? • Sometimes can’t write an integrated posterior •

    Even when can, often cannot use it • Many problems are like this: • Many multilevel models • Networks/phylogenies • Some spatial models • Maximization not a good strategy in high dimensions — must have full distribution
  19. MCMC strategies • Metropolis: Granddaddy of them all • Metropolis-Hastings

    (MH): More general • Gibbs sampling (GS): Efficient version of MH • Hamiltonian Monte Carlo (HMC) • Newer algorithms much better • New methods being developed
  20. Hamiltonian Monte Carlo • Problem with Gibbs sampling (GS) •

    Models with many parameters usually have lots of highly correlated parameters • GS gets stuck, degenerates towards random walk • At best, inefficient because re-explores • Hamiltonian dynamics to the rescue • represent parameter state as particle in n- dimensional space • flick it around frictionless log-posterior • record positions William Rowan Hamilton (1805–1865) Commemorated on Irish Euro coin
  21. Hamiltonian Monte Carlo • Population density curve: log-posterior • Position

    of car: parameter vector • Speed of car: momentum of parameter values • Go fast when high • Go slow when low • Samples of position are samples from posterior
  22. MCMC from Hamiltonian dynamics 0 200 400 600 800 1000

    −3 −2 −1 0 1 2 3 iteration last position coordinate Random−walk Metropolis 0 200 400 600 800 1000 −3 −2 −1 0 1 2 3 iteration last position coordinate Hamiltonian Monte Carlo gure 6: Values for the variable with largest standard deviation for the 100-dimension ample, from a random-walk Metropolis run and an HMC run with L = 150. To mat mputation time, 150 updates were counted as one iteration for random-walk Metropoli te works only when, as here, the distribution is tightly constrained in only one directio Neal. 2011. MCMC from Hamiltonian dynamics
  23. • mc-stan.org • Install RStan 1. Get C++ compiler 2.

    ??? 3. Profit Stanislaw Ulam (1909–1984)
  24. Stan is NUTS • No U-Turn Sampler (NUTS2): Adaptive Hamiltonian

    Monte Carlo • Implemented in Stan (rstan: mc-stan.org) • We’ll use map2stan for now Formula Stan model C++ model Reusable Sampler
  25. HMC example • Back to terrain ruggedness, data(rugged) • Re-estimate

    using Stan • See how it works • See what to expect from healthy Markov chains
  26. MAP estimates !! ʆǦ !DZ ,*-)"1"Ǒ 0"0ǯ!ɢ/$!-- Ǯƿƽƽƽǰ ǒ Dz

    4P ZPV SFNFNCFS UIF PME XBZ XFSF HPJOH UP SFQFBU UIF QSPDFEVSF GPS ĕUUJOH UIF JOUFSBDUJPO NPEFM ćJT NPEFM BJNT UP QSFEJDU MPH(%1 XJUI UFSSBJO SVHHFEOFTT DPOUJOFOU BOE UIF JOUFSBDUJPO PG UIF UXP )FSFT UIF XBZ UP EP JU XJUI *- KVTU MJLF CFGPSF 3 DPEF  *DžǑƾ ʆǦ *-ǯ )&01ǯ ),$Ǯ$!- ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ ǹ/2$$"! ʀ ǹ ,+1Ǯ#/&  ʀ ǹ/2$$"!ǹ ,+1Ǯ#/&  ǒ  ʍ !+,/*ǯƽǒƾƽƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ 0&$* ʍ !2+&#ǯƽǒƾƽǰ ǰ ǒ !1ʅ!! ǰ -/" &0ǯ*DžǑƾǰ "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  džǑƿƿ ƽǑƾǁ džǑƽƽ džǑǁǁ  ǦƽǑƿƽ ƽǑƽDž ǦƽǑǀƿ ǦƽǑƽDž  ǦƾǑdžǂ ƽǑƿƿ ǦƿǑǀƾ ǦƾǑǂdž  ƽǑǀdž ƽǑƾǀ ƽǑƾdž ƽǑǃƽ 0&$* ƽǑdžǀ ƽǑƽǂ ƽǑDžǂ ƾǑƽƾ +VTU BT ZPV TBX JO UIF QSFWJPVT DIBQUFS  1SFQBSBUJPO #VU OPX XFMM BMTP ĕU UIJT NPEFM VTJOH )BNJMUPOJBO .POUF $BSMP ćJT
  27. ɢ ,+1Ǯ#/& Ǔ &+1 ƾ ƽ ƽ ƽ ƽ ƽ

    ƽ ƽ ƽ ƾ ǑǑǑ ćF EBUB GSBNF !!Ǒ1/&* DPOUBJOT POMZ UIF UISFF WBSJBCMFT XFSF VTJOH  &TUJNBUJPO /PX QSPWJEFE ZPV IBWF UIF /01+ QBDLBHF JOTUBMMFE NDTUBOPSH ZPV DBO HFU TBNQMFT GSPN UIF QPTUFSJPS EJTUSJCVUJPO XJUI UIJT DPEF 3 DPEF  *DžǑƾ01+ ʆǦ *-ƿ01+ǯ )&01ǯ ),$Ǯ$!- ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ ǹ/2$$"! ʀ ǹ ,+1Ǯ#/&  ʀ ǹ/2$$"!ǹ ,+1Ǯ#/&  ǒ  ʍ !+,/*ǯƽǒƾƽƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ  ʍ !+,/*ǯƽǒƾƽǰǒ 0&$* ʍ ! 2 %6ǯƽǒƿǰ ǰ ǒ !1ʅ!!Ǒ1/&* ǰ ćFSF JT POF DIBOHF UP OPUF IFSF CVU UP FYQMBJO MBUFS ćF VOJGPSN QSJPS PO 0&$* IBT CFFO DIBOHFE UP B IBMG$Įłİĵņ QSJPS ćF $BVDIZ EJTUSJCVUJPO JT B VTFGVM UIJDLUBJMFE QSPCBCJMJUZ EJTUSJCVUJPO SFMBUFE UP UIF 4UVEFOU U EJTUSJCVUJPO ćFSFT BO 0WFSUIJOLJOH CPY BCPVU JU MBUFS JO UIF DIBQUFS QBHF   :PV DBO UIJOL PG JU BT B XFBLMZ SFHVMBSJ[JOH QSJPS GPS TUBOEBSE EFWJBUJPOT 8FMM VTF JU BHBJO MBUFS JO UIF DIBQUFS "OE ZPVMM IBWF NBOZ DIBODFT UP HFU VTFE UP JU BT UIF CPPL DPOUJOVFT #VU OPUF UIBU JU JT OPU OFDFTTBSZ UP VTF B IBMG$BVDIZ ćF VOJGPSN QSJPS XJMM TUJMM XPSL BOE B TJNQMF FYQPOFOUJBM QSJPS JT BMTP BQQSPQSJBUF *O UIJT FYBNQMF BT JO NBOZ UIFSF JT TP HMC estimates
  28. What is “Cauchy”? • One of many things named after

    Augustin-Louis Cauchy (KO-shee) • Ratio of two Gaussian samples • Useful distribution with thick tails • Parameters: location of mode, scale • Mean and variance undefined • Related to Lévy flights Baron Augustin-Louis Cauchy (1789–1857)
  29. Extract samples • Once you have samples, they are just

    samples  ǦƾǑdžǁ ƽǑƿƿ ǦƿǑƿdž ǦƾǑǂDž ƾƾǂǁ ƾ  ƽǑǀdž ƽǑƾǀ ƽǑƾDŽ ƽǑǂdž ƾƾǁǁ ƾ 0&$* ƽǑdžǂ ƽǑƽǂ ƽǑDžDŽ ƾǑƽǀ ƾdžǃƽ ƾ ćF /"0*-)" GVODUJPO XJMM BMTP SFDPNQVUF %*$ BOE 8"*$ GPS ZPV VTJOH POMZ UIF TBNQMFT :PV DBO BMTP BEE UIF ,/"0 BSHVNFOU UP BOZ PSJHJOBM *-ƿ01+ DBMM *U XJMM B NBUJDBMMZ QBSBMMFMJ[F UIF DIBJOT  7JTVBMJ[BUJPO #Z QMPUUJOH UIF TBNQMFT ZPV DBO HFU B EJSFDU BQQSFDJBUJPO GPS (BVTTJBO RVBESBUJD UIF BDUVBM QPTUFSJPS EFOTJUZ IBT UVSOFE PVU UP CF 5P QVMM PVU UIF T QMFT VTF 3 DPEF  -,01 ʆǦ "51/ 1Ǒ0*-)"0ǯ *DžǑƾ01+ ǰ 01/ǯ-,01ǰ &01 ,# ǂ ɢ  Ǔ +2* DZƾǓƾƽƽƽǯƾ!ǰDz džǑǀ džǑǀǁ džǑƿƾ džǑǁǀ džǑƿDž ǑǑǑ ɢ / Ǔ +2* DZƾǓƾƽƽƽǯƾ!ǰDz ǦƽǑƾǀǀ ǦƽǑƿƾǁ ǦƽǑƿƾǂ ǦƽǑƿƿdž ǦƽǑƿƽdž ǑǑǑ ɢ  Ǔ +2* DZƾǓƾƽƽƽǯƾ!ǰDz ǦƾǑdžƾ ǦƿǑƿǂ ǦƿǑƿǃ ǦƾǑdžDž ǦƾǑdždž ǑǑǑ ɢ / Ǔ +2* DZƾǓƾƽƽƽǯƾ!ǰDz ƽǑƾǀǀ ƽǑǀǃDŽ ƽǑǂǀǀ ƽǑƿǂǁ ƽǑǁǃDž ǑǑǑ ɢ 0&$*Ǔ +2* DZƾǓƾƽƽƽǯƾ!ǰDz ƽǑdžDžDž ƽǑdžǁdž ƽǑdžƽǁ ƽǑdžDŽǃ ƽǑdžǀǁ ǑǑǑ /PUF UIBU -,01 IFSF JT B )&01 OPU B !1Ǒ#/*" ćJT GBDU XJMM CF WFSZ VTFGVM UP VTF PO XIFO XF FODPVOUFS NVMUJMFWFM NPEFMT 'PS OPX JG JU EPFTOU CFIBWF MJLF ZPV FYQFDU ZPV DBO DPFSDF JU UP B EBUB GSBNF XJUI -,01ʆǦ0Ǒ!1Ǒ#/*"ǯ-,01ǰ ćFSF BSF POMZ  TBNQMFT GPS FBDI QBSBNFUFS CFDBVTF UIBUT UIF EFGBVMU *O UIJT DBTF JUT FOPVHI ćFSFT
  30. Check the chain • Sometimes it doesn’t work • First

    and most important check: trace plot  &"4: ).$ ƽ  'ĶĴłĿIJ ƐƎ 5SBDF QMPU PG UIF .BSLPW DIBJO GSPN UIF SVHHFEOFTT NPEFM (ǃǏƼ./) ćJT JT B DMFBO IFBMUIZ .BSLPW DIBJO CPUI TUBUJPOBSZ BOE warmup ĽĹļŁ " USBDF QMPU NFSFMZ QMPUT UIF TBNQMFT JO TFRVFOUJBM PSEFS KPJO .BSLPWT QBUI UISPVHI UIF JTMBOET JO UIF NFUBQIPS BU UIF TUBSU PG UIF D USBDF QMPU PG FBDI QBSBNFUFS DBO IFMQ UP EJBHOPTF NBOZ DPNNPO QSP DPNF UP SFDPHOJ[F B IFBMUIZ GVODUJPOJOH .BSLPW DIBJO RVJDL DIFDLT B MPU PG QFBDF PG NJOE " USBDF QMPU JTOU UIF MBTU UIJOH BOBMZTUT EP UP JO #VU JUT OFBSMZ BMXBZT UIF ĕSTU *O UIF UFSSBJO SVHHFEOFTT FYBNQMF UIF USBDF QMPU TIPXT B WFSZ I XJUI +'*/ǿ(Ǖǡǎ./)Ȁ ćF SFTVMU JT TIPXO JO 'ĶĴłĿIJ Ɛƌ &BDI QMPU JO UIJT ĕHVSF JT TJNJM ZPV KVTU VTFE GPS FYBNQMF +'*/ǿ+*./ɶǢ/4+ ʙǫ'ǫȀ CVU XJUI TPNF MBCFMJOH UP IFMQ PVU :PV DBO UIJOL PG UIF [JH[BHHJOH USBDF PG FBDI UIF DIBJO UPPL UISPVHI FBDI EJNFOTJPO PG QBSBNFUFS TQBDF *UT FBT UIF [PPNFE QMPU JO UIF MPXFS SJHIU DPSOFS XIJDI EJTQMBZT POMZ UIF XBSNVQ ćF HSBZ SFHJPO JO FBDI QMPU UIF ĕSTU POFUIPVTBOE TBNQMFT NBS QMFT %VSJOH BEBQUBUJPO UIF .BSLPW DIBJO JT MFBSOJOH UP NPSF FďDJ QPTUFSJPS EJTUSJCVUJPO 4P UIFTF TBNQMFT BSF OPU OFDFTTBSJMZ SFMJBCMF UP BSF BVUPNBUJDBMMZ EJTDBSEFE CZ 3/-/ǡ.(+' . XIJDI SFUVSOT PO JO UIF XIJUF SFHJPOT PG 'ĶĴłĿIJ Ɛƌ