Statistical Rethinking Fall 2017 Lecture 10

Week 6: Markov Chain Monte Carlo Richard McElreath Statistical Rethinking

Goals this week • Introduce Markov chain Monte Carlo •
Understand the approach • Meet different algorithms • Meet interface to MCMC, map2stan • How to check chains • How to fix common problems • Meet maximum entropy & GLMs

King Markov

The Metropolis Archipelago

Contract: King Markov must visit each island in proportion to
its population size. Here’s how he does it...

(1) Flip a coin to choose island on left or
right. Call it the “proposal” island. 1 2 1 2

(2) Find population of proposal island. (1) Flip a coin
to choose island on left or right. Call it the “proposal” island. proposal

(2) Find population of proposal island. 1 2 3 4
5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. proposal

5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. proposal

5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. p4 proposal

5 6 7 p5 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. p4 (4) Move to proposal, with probability = p5 p4 proposal

5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. (4) Move to proposal, with probability = p5 p4 (5) Repeat from (1)

5 6 7 (1) Flip a coin to choose island on left or right. Call it the “proposal” island. (3) Find population of current island. (4) Move to proposal, with probability = p5 p4 (5) Repeat from (1) This procedure ensures visiting each island in proportion to its population, in the long run.

Metropolis algorithm XFFLT PUIFS UJNFT CPVODJOH BSPVOE XJUIPVU BQQBSFOU QBUUFSO
#VU JO UIF MPOH SVO UIJT QSPDFEVSF HVBSBOUFFT UIBU UIF LJOH XJMM CF GPVOE PO FBDI JTMBOE JO QSPQPSUJPO UP JUT QPQ VMBUJPO TJ[F :PV DBO QSPWF UIJT UP ZPVSTFMG CZ TJNVMBUJOH ,JOH .BSLPWT KPVSOFZ )FSFT B TIPSU QJFDF PG DPEF UP EP UIJT TUPSJOH UIF IJTUPSZ PG UIF LJOHT JTMBOE QPTJUJPOT JO UIF WFDUPS -,0&1&,+0 3 DPEF +2*Ǒ4""(0 ʆǦ ƾ"ǁ -,0&1&,+0 ʆǦ /"-ǯƽǒ+2*Ǒ4""(0ǰ 2//"+1 ʆǦ ƾƽ #,/ ǯ & &+ ƾǓ+2*Ǒ4""(0 ǰ ǳ ȅ /" ,/! 2//"+1 -,0&1&,+ -,0&1&,+0Ǳ&ǲ ʆǦ 2//"+1 ȅ #)&- ,&+ 1, $"+"/1" -/,-,0) -/,-,0) ʆǦ 2//"+1 ʀ 0*-)"ǯ ǯǦƾǒƾǰ ǒ 0&7"ʅƾ ǰ ȅ +,4 *(" 02/" %" ),,-0 /,2+! 1%" / %&-")$, &# ǯ -/,-,0) ʆ ƾ ǰ -/,-,0) ʆǦ ƾƽ &# ǯ -/,-,0) ʇ ƾƽ ǰ -/,-,0) ʆǦ ƾ ȅ *,3"ǘ -/,Ǒ*,3" ʆǦ -/,-,0)ǵ 2//"+1 2//"+1 ʆǦ &#")0"ǯ /2+&#ǯƾǰʆ-/,Ǒ*,3" ǒ -/,-,0) ǒ 2//"+1 ǰ Ǵ

Markov’s chain of visits 0 200 400 600 800 1000
2 4 6 8 10 week island

2 4 6 8 10 0 5 10 15 20
25 island number of weeks after 100 weeks 2 4 6 8 10 0 20 40 60 80 100 island number of weeks after 500 weeks 2 4 6 8 10 0 100 200 300 400 island number of weeks after 2000 weeks 0 2000 4000 6000 8000 10000 2 4 6 8 10 week island

Markov’s chain of visits • Converges to correct proportions, in
the long run • No matter which island starts • As long as proposals are symmetric • Example of Metropolis algorithm 2 4 6 8 10 0 500 1000 1500 island number of weeks after 10000 weeks

Metropolis and MCMC • Usual use is to draw samples
from a posterior distribution • “Islands”: parameter values • “Population size”: proportional to posterior probability • Works for any number of dimensions (parameters) • Works for continuous as well as discrete parameters

0 2000 4000 6000 8000 10000 1.0 1.5 2.0 2.5
Index p

Metropolis and MCMC • Metropolis: Simple version of Markov chain
Monte Carlo (MCMC) • Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953) into radicals (by collision, or possibly radiation, as in aromatic hydrocarbons). (b) The molecules dissociate, but the resulting radicals recombine without escaping from the liquid cage. (c) The molecules dissociate and escape from the cage. In this case we would not expect them to move more than a few molecular diameters through the dense medium before being thermalized. In accordance with the notation introduced by Burton, Magee, and Samuel,22 the molecules following 22 Burton, Magee, and Samuel, J. Chern. Phys. 20, 760 (1952). THE JOURNAL OF CHEMICAL PHYSICS tions that the ionized H2 0 molecules will become the H2 0t molecules, but this is not likely to be a complete correspondence. In conclusion we would like to emphasize that the qualitative result of this section is not critically de- pendent on the exact values of the physical parameters used. However, this treatment is classical, and a correct treatment must be wave mechanical; therefore the result of this section cannot be taken as an a priori theoretical prediction. The success of the radical diffu- sion model given above lends some plausibility to the occurrence of electron capture as described by this crude calculation. Further work is clearly needed. VOLUME 21, NUMBER 6 JUNE, 1953 Equation of State Calculations by Fast Computing Machines NICHOLAS METROPOLIS, ARIANNA W. ROSENBLUTH, MARSHALL N. ROSENBLUTH, AND AUGUSTA H. TELLER, Los Alamos Scientific Laboratory, Los Alamos, New Mexico AND EDWARD TELLER, * Department of Physics, University of Chicago, Chicago, Illinois (Received March 6, 1953) A general method, suitable for fast computing machines, for investigatiflg such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two-dimensional rigid-sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four-term virial coefficient expansion.

MANIAC: Mathematical Analyzer, Numerical Integrator, and Computer MANIAC: 1000 pounds
5 kilobytes of memory 70k multiplications/sec Your laptop: 4–7 pounds 2–8 million kilobytes Billions of multiplications/sec

Metropolis and MCMC • Metropolis: Simple version of Markov chain
Monte Carlo (MCMC) • Chain: Sequence of draws from distribution • Markov chain: History doesn’t matter, just where you are now • Monte Carlo: Random simulation Andrei Andreyevich Markov (Ма́рков) (1856–1922)

Why MCMC? • Sometimes can’t write an integrated posterior •
Even when can, often cannot use it • Many problems are like this: • Many multilevel models • Networks/phylogenies • Some spatial models • Maximization not a good strategy in high dimensions — must have full distribution

MCMC strategies • Metropolis: Granddaddy of them all • Metropolis-Hastings
(MH): More general • Gibbs sampling (GS): Efficient version of MH • Hamiltonian Monte Carlo (HMC) • Newer algorithms much better • New methods being developed

https://chi-feng.github.io/mcmc-demo/

Hamiltonian Monte Carlo • Problem with Gibbs sampling (GS) •
Models with many parameters usually have lots of highly correlated parameters • GS gets stuck, degenerates towards random walk • At best, inefficient because re-explores • Hamiltonian dynamics to the rescue • represent parameter state as particle in n- dimensional space • flick it around frictionless log-posterior • record positions William Rowan Hamilton (1805–1865) Commemorated on Irish Euro coin

King Monty’s Kingdom

King Monty’s Kingdom relative speed

Hamiltonian Monte Carlo • Population density curve: log-posterior • Position
of car: parameter vector • Speed of car: momentum of parameter values • Go fast when high • Go slow when low • Samples of position are samples from posterior

MCMC from Hamiltonian dynamics 0 200 400 600 800 1000
−3 −2 −1 0 1 2 3 iteration last position coordinate Random−walk Metropolis 0 200 400 600 800 1000 −3 −2 −1 0 1 2 3 iteration last position coordinate Hamiltonian Monte Carlo gure 6: Values for the variable with largest standard deviation for the 100-dimension ample, from a random-walk Metropolis run and an HMC run with L = 150. To mat mputation time, 150 updates were counted as one iteration for random-walk Metropoli te works only when, as here, the distribution is tightly constrained in only one directio Neal. 2011. MCMC from Hamiltonian dynamics

• mc-stan.org • Install RStan 1. Get C++ compiler 2.
??? 3. Profit Stanislaw Ulam (1909–1984)

Stan is NUTS • No U-Turn Sampler (NUTS2): Adaptive Hamiltonian
Monte Carlo • Implemented in Stan (rstan: mc-stan.org) • We’ll use map2stan for now Formula Stan model C++ model Reusable Sampler

HMC example • Back to terrain ruggedness, data(rugged) • Re-estimate
using Stan • See how it works • See what to expect from healthy Markov chains

MAP estimates !! ʆǦ !Ǳ ,*-)"1"Ǒ 0"0ǯ!ɢ/$!-- Ǯƿƽƽƽǰ ǒ ǲ
4P ZPV SFNFNCFS UIF PME XBZ XFSF HPJOH UP SFQFBU UIF QSPDFEVSF GPS ĕUUJOH UIF JOUFSBDUJPO NPEFM ćJT NPEFM BJNT UP QSFEJDU MPH(%1 XJUI UFSSBJO SVHHFEOFTT DPOUJOFOU BOE UIF JOUFSBDUJPO PG UIF UXP )FSFT UIF XBZ UP EP JU XJUI *- KVTU MJLF CFGPSF 3 DPEF *ǅǑƾ ʆǦ *-ǯ )&01ǯ ),$Ǯ$!- ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ ʀ ǹ/2$$"! ʀ ǹ ,+1Ǯ#/& ʀ ǹ/2$$"!ǹ ,+1Ǯ#/& ǒ ʍ !+,/*ǯƽǒƾƽƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ 0&$* ʍ !2+&#ǯƽǒƾƽǰ ǰ ǒ !1ʅ!! ǰ -/" &0ǯ*ǅǑƾǰ "+ 1!"3 ǂǑǂɵ ǆǁǑǂɵ ǆǑƿƿ ƽǑƾǁ ǆǑƽƽ ǆǑǁǁ ǦƽǑƿƽ ƽǑƽǅ ǦƽǑǀƿ ǦƽǑƽǅ ǦƾǑǆǂ ƽǑƿƿ ǦƿǑǀƾ ǦƾǑǂǆ ƽǑǀǆ ƽǑƾǀ ƽǑƾǆ ƽǑǃƽ 0&$* ƽǑǆǀ ƽǑƽǂ ƽǑǅǂ ƾǑƽƾ +VTU BT ZPV TBX JO UIF QSFWJPVT DIBQUFS 1SFQBSBUJPO #VU OPX XFMM BMTP ĕU UIJT NPEFM VTJOH )BNJMUPOJBO .POUF $BSMP ćJT

ɢ ,+1Ǯ#/& Ǔ &+1 ƾ ƽ ƽ ƽ ƽ ƽ
ƽ ƽ ƽ ƾ ǑǑǑ ćF EBUB GSBNF !!Ǒ1/&* DPOUBJOT POMZ UIF UISFF WBSJBCMFT XFSF VTJOH &TUJNBUJPO /PX QSPWJEFE ZPV IBWF UIF /01+ QBDLBHF JOTUBMMFE NDTUBOPSH ZPV DBO HFU TBNQMFT GSPN UIF QPTUFSJPS EJTUSJCVUJPO XJUI UIJT DPEF 3 DPEF *ǅǑƾ01+ ʆǦ *-ƿ01+ǯ )&01ǯ ),$Ǯ$!- ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ ʀ ǹ/2$$"! ʀ ǹ ,+1Ǯ#/& ʀ ǹ/2$$"!ǹ ,+1Ǯ#/& ǒ ʍ !+,/*ǯƽǒƾƽƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ ʍ !+,/*ǯƽǒƾƽǰǒ 0&$* ʍ ! 2 %6ǯƽǒƿǰ ǰ ǒ !1ʅ!!Ǒ1/&* ǰ ćFSF JT POF DIBOHF UP OPUF IFSF CVU UP FYQMBJO MBUFS ćF VOJGPSN QSJPS PO 0&$* IBT CFFO DIBOHFE UP B IBMG$Įłİĵņ QSJPS ćF $BVDIZ EJTUSJCVUJPO JT B VTFGVM UIJDLUBJMFE QSPCBCJMJUZ EJTUSJCVUJPO SFMBUFE UP UIF 4UVEFOU U EJTUSJCVUJPO ćFSFT BO 0WFSUIJOLJOH CPY BCPVU JU MBUFS JO UIF DIBQUFS QBHF :PV DBO UIJOL PG JU BT B XFBLMZ SFHVMBSJ[JOH QSJPS GPS TUBOEBSE EFWJBUJPOT 8FMM VTF JU BHBJO MBUFS JO UIF DIBQUFS "OE ZPVMM IBWF NBOZ DIBODFT UP HFU VTFE UP JU BT UIF CPPL DPOUJOVFT #VU OPUF UIBU JU JT OPU OFDFTTBSZ UP VTF B IBMG$BVDIZ ćF VOJGPSN QSJPS XJMM TUJMM XPSL BOE B TJNQMF FYQPOFOUJBM QSJPS JT BMTP BQQSPQSJBUF *O UIJT FYBNQMF BT JO NBOZ UIFSF JT TP HMC estimates

What is “Cauchy”? • One of many things named after
Augustin-Louis Cauchy (KO-shee) • Ratio of two Gaussian samples • Useful distribution with thick tails • Parameters: location of mode, scale • Mean and variance undefined • Related to Lévy flights Baron Augustin-Louis Cauchy (1789–1857)

Extract samples • Once you have samples, they are just
samples ǦƾǑǆǁ ƽǑƿƿ ǦƿǑƿǆ ǦƾǑǂǅ ƾƾǂǁ ƾ ƽǑǀǆ ƽǑƾǀ ƽǑƾǄ ƽǑǂǆ ƾƾǁǁ ƾ 0&$* ƽǑǆǂ ƽǑƽǂ ƽǑǅǄ ƾǑƽǀ ƾǆǃƽ ƾ ćF /"0*-)" GVODUJPO XJMM BMTP SFDPNQVUF %*$ BOE 8"*$ GPS ZPV VTJOH POMZ UIF TBNQMFT :PV DBO BMTP BEE UIF ,/"0 BSHVNFOU UP BOZ PSJHJOBM *-ƿ01+ DBMM *U XJMM B NBUJDBMMZ QBSBMMFMJ[F UIF DIBJOT 7JTVBMJ[BUJPO #Z QMPUUJOH UIF TBNQMFT ZPV DBO HFU B EJSFDU BQQSFDJBUJPO GPS (BVTTJBO RVBESBUJD UIF BDUVBM QPTUFSJPS EFOTJUZ IBT UVSOFE PVU UP CF 5P QVMM PVU UIF T QMFT VTF 3 DPEF -,01 ʆǦ "51/ 1Ǒ0*-)"0ǯ *ǅǑƾ01+ ǰ 01/ǯ-,01ǰ &01 ,# ǂ ɢ Ǔ +2* ǱƾǓƾƽƽƽǯƾ!ǰǲ ǆǑǀ ǆǑǀǁ ǆǑƿƾ ǆǑǁǀ ǆǑƿǅ ǑǑǑ ɢ / Ǔ +2* ǱƾǓƾƽƽƽǯƾ!ǰǲ ǦƽǑƾǀǀ ǦƽǑƿƾǁ ǦƽǑƿƾǂ ǦƽǑƿƿǆ ǦƽǑƿƽǆ ǑǑǑ ɢ Ǔ +2* ǱƾǓƾƽƽƽǯƾ!ǰǲ ǦƾǑǆƾ ǦƿǑƿǂ ǦƿǑƿǃ ǦƾǑǆǅ ǦƾǑǆǆ ǑǑǑ ɢ / Ǔ +2* ǱƾǓƾƽƽƽǯƾ!ǰǲ ƽǑƾǀǀ ƽǑǀǃǄ ƽǑǂǀǀ ƽǑƿǂǁ ƽǑǁǃǅ ǑǑǑ ɢ 0&$*Ǔ +2* ǱƾǓƾƽƽƽǯƾ!ǰǲ ƽǑǆǅǅ ƽǑǆǁǆ ƽǑǆƽǁ ƽǑǆǄǃ ƽǑǆǀǁ ǑǑǑ /PUF UIBU -,01 IFSF JT B )&01 OPU B !1Ǒ#/*" ćJT GBDU XJMM CF WFSZ VTFGVM UP VTF PO XIFO XF FODPVOUFS NVMUJMFWFM NPEFMT 'PS OPX JG JU EPFTOU CFIBWF MJLF ZPV FYQFDU ZPV DBO DPFSDF JU UP B EBUB GSBNF XJUI -,01ʆǦ0Ǒ!1Ǒ#/*"ǯ-,01ǰ ćFSF BSF POMZ TBNQMFT GPS FBDI QBSBNFUFS CFDBVTF UIBUT UIF EFGBVMU *O UIJT DBTF JUT FOPVHI ćFSFT

Figure 8.3

Check the chain • Sometimes it doesn’t work • First
and most important check: trace plot &"4: ).$ ƽ 'ĶĴłĿĲ ƐƎ 5SBDF QMPU PG UIF .BSLPW DIBJO GSPN UIF SVHHFEOFTT NPEFM (ǃǏƼ./) ćJT JT B DMFBO IFBMUIZ .BSLPW DIBJO CPUI TUBUJPOBSZ BOE warmup ĽĹļŁ " USBDF QMPU NFSFMZ QMPUT UIF TBNQMFT JO TFRVFOUJBM PSEFS KPJO .BSLPWT QBUI UISPVHI UIF JTMBOET JO UIF NFUBQIPS BU UIF TUBSU PG UIF D USBDF QMPU PG FBDI QBSBNFUFS DBO IFMQ UP EJBHOPTF NBOZ DPNNPO QSP DPNF UP SFDPHOJ[F B IFBMUIZ GVODUJPOJOH .BSLPW DIBJO RVJDL DIFDLT B MPU PG QFBDF PG NJOE " USBDF QMPU JTOU UIF MBTU UIJOH BOBMZTUT EP UP JO #VU JUT OFBSMZ BMXBZT UIF ĕSTU *O UIF UFSSBJO SVHHFEOFTT FYBNQMF UIF USBDF QMPU TIPXT B WFSZ I XJUI +'*/ǿ(Ǖǡǎ./)Ȁ ćF SFTVMU JT TIPXO JO 'ĶĴłĿĲ Ɛƌ &BDI QMPU JO UIJT ĕHVSF JT TJNJM ZPV KVTU VTFE GPS FYBNQMF +'*/ǿ+*./ɶǢ/4+ ʙǫ'ǫȀ CVU XJUI TPNF MBCFMJOH UP IFMQ PVU :PV DBO UIJOL PG UIF [JH[BHHJOH USBDF PG FBDI UIF DIBJO UPPL UISPVHI FBDI EJNFOTJPO PG QBSBNFUFS TQBDF *UT FBT UIF [PPNFE QMPU JO UIF MPXFS SJHIU DPSOFS XIJDI EJTQMBZT POMZ UIF XBSNVQ ćF HSBZ SFHJPO JO FBDI QMPU UIF ĕSTU POFUIPVTBOE TBNQMFT NBS QMFT %VSJOH BEBQUBUJPO UIF .BSLPW DIBJO JT MFBSOJOH UP NPSF FďDJ QPTUFSJPS EJTUSJCVUJPO 4P UIFTF TBNQMFT BSF OPU OFDFTTBSJMZ SFMJBCMF UP BSF BVUPNBUJDBMMZ EJTDBSEFE CZ 3/-/ǡ.(+' . XIJDI SFUVSOT PO JO UIF XIJUF SFHJPOT PG 'ĶĴłĿĲ Ɛƌ

Statistical Rethinking Fall 2017 Lecture 10

Statistical Rethinking Fall 2017 Lecture 10

More Decks by Richard McElreath

Other Decks in Education

Featured

Transcript