Data analysis with MCMC

Data analysis with MCMC

00c684a144d49f612a51e855eb326d6c?s=128

Dan Foreman-Mackey

April 02, 2013
Tweet

Transcript

  1. data analysis with Markov chain Monte Carlo Dan Foreman-Mackey CCPP@NYU

  2.    dan.iel.fm dfm @exoplaneteer Dan Foreman-Mackey

  3. I'm a grad student in Camp Hogg at NYU David

    W. Hogg
  4. I'm an engineer in Camp Hogg at NYU David W.

    Hogg
  5. David W. Hogg I build tools for data analysis in

    astronomy(mostly)
  6. emceethe MCMC Hammer introducing arxiv.org/abs/1202.3665 dan.iel.fm/emcee it's hammer time!

  7. I work on the Local Group exoplanets variable stars image

    modeling calibration
  8. I work on the Local Group exoplanets variable stars image

    modeling calibration not writing papers
  9. I work on the Local Group exoplanets variable stars image

    modeling calibration not writing papers
  10. Physics Data The graphical model of my research. a sketch

    of
  11. Physics Data The graphical model of my research. a sketch

    of the generative view of data analysis
  12. 2 = N X n=1 [ yn ( m xn

    + b )]2 2 n a line ˆ yn = m xn + b + ✏n ✏n ⇠ N(0; 2 n ) synthetic data: noise model: p( { yn, xn, 2 n } | m, b) / exp ✓ 1 2 2 ◆ for example
  13. Physics The graphical model of my research. a sketch of

    inference Data
  14. p(data | physics) likelihood function/generative model p(physics | data) /

    p(physics) p(data | physics) posterior probability
  15. but physics is everything. including things we don't care about!

    good/bad news:
  16. a more realistic model: p(D | ✓, ↵) data cool

    physics garbage
  17. what if we underestimated our error bars? N Y n=1

    1 p 2 ⇡ ( 2 n + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, 2 n } | m, b, 2) = "physics" "jitter" (not physics) (by some additive variance)
  18. p(D | ✓) / Z p(↵) p(D | ✓, ↵)

    d↵ marginalize. Do The Right Thing™
  19. Z d 2 p( 2 ) N Y n=1 1

    p 2 ⇡ ( 2 n + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, n } | m, b ) = in our example BOOM!
  20. What do you really want?

  21. What do you really want? Ep[f(✓)] = 1 Z Z

    f(✓) p(✓) p(D | ✓) d✓
  22. probably. What do you really want? Ep[f(✓)] = 1 Z

    Z f(✓) p(✓) p(D | ✓) d✓
  23. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation
  24. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation large number of dimensions
  25. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation large number of dimensions whoa!
  26. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation This is HARD. (in general) large number of dimensions whoa!
  27. OK now that we agree…

  28. Z f( x ) p( x ) d x ⇡

    1 N N X n=1 f( xn) xn ⇠ p( x ) where: error: number of independent samples as you learned in middle school / 1 p N0
  29. Z f( x ) p( x ) d x ⇡

    1 N N X n=1 f( xn) xn ⇠ p( x ) where: error: number of independent samples as you learned in middle school / 1 p N0
  30. MCMC draws samples from a probability function and all you

    need to be able to do is evaluate the function (up to a constant)
  31. Metropolis–Hastings

  32. Metropolis–Hastings in an ideal world

  33. start here perhaps Metropolis–Hastings in an ideal world

  34. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x )
  35. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  36. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) definitely. accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  37. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) definitely. accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆ only relative probabilities
  38. Metropolis–Hastings in an ideal world x 0 ⇠ q( x

    0; x )
  39. Metropolis–Hastings in an ideal world x 0 ⇠ q( x

    0; x )
  40. Metropolis–Hastings in an ideal world x 0 ⇠ q( x

    0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  41. not this time. Metropolis–Hastings in an ideal world x 0

    ⇠ q( x 0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  42. Metropolis–Hastings in an ideal world

  43. double count! Metropolis–Hastings in an ideal world

  44. Metropolis–Hastings in an ideal world

  45. Metropolis–Hastings in an ideal world

  46. Metropolis–Hastings in the real world

  47. Metropolis–Hastings in the real world

  48. Metropolis–Hastings in the real world

  49. Metropolis–Hastings in the real world

  50. Metropolis–Hastings in the real world

  51. Metropolis–Hastings in the real world the Small Acceptance Fraction problem

  52. Metropolis–Hastings in the real world

  53. Metropolis–Hastings in the real world

  54. Metropolis–Hastings in the real world the Huge Acceptance Fraction problem

  55. a brief aside.

  56. YOUR LIKELIHOOD FUNCTION IS EXPENSIVE

  57. Metropolis–Hastings in the real world

  58. YOUR LIKELIHOOD FUNCTION IS EXPENSIVE REMEMBER?

  59. Metropolis–Hastings in the real world

  60. Metropolis–Hastings in the real world D(D+1)/2 tuning parameters that's

  61. Metropolis–Hastings in the real world D(D+1)/2 tuning parameters that's the

    dimension of your problem
  62. Metropolis–Hastings in the real world D(D+1)/2 tuning parameters that's the

    dimension of your problem Scientific Awesomeness how hard is MCMC Metropolis Hastings how things Should be (~number of parameters)
  63. that being said, go code up your own Metropolis–Hastings code

    right now. (well not right right now)
  64. Jonathan Goodman Jonathan Weare (dfm.io/mcmc-gw10) "Ensemble samplers with affine invariance"

  65. Ensemble Samplers in the real world

  66. Ensemble Samplers in the real world

  67. Ensemble samplers with affine invariance

  68. y A x + b Affine Transformation hard easy

  69. y A x + b Affine Transformation hard easy THE

    SAME
  70. Ensemble Samplers in the real world

  71. Ensemble Samplers in the real world

  72. choose a helper Ensemble Samplers in the real world

  73. choose a helper Ensemble Samplers in the real world

  74. choose a helper Ensemble Samplers in the real world propose

    a new position
  75. choose a helper Ensemble Samplers in the real world accept?

    p(accept) = min ✓ 1, ZD 1 p( x ) p( x 0) ◆ propose a new position
  76. Ensemble Samplers in the real world

  77. Ensemble Samplers in the real world

  78. choose a helper Ensemble Samplers in the real world

  79. Ensemble Samplers in the real world

  80. go code up your own Ensemble Sampler code right now?

  81. emceethe MCMC Hammer introducing arxiv.org/abs/1202.3665 dan.iel.fm/emcee it's hammer time!

  82. emceethe MCMC Hammer introducing arxiv.org/abs/1202.3665 dan.iel.fm/emcee it's hammer time! pip

    install emcee to install:
  83. it's hammer time! using emcee is easy let me show

    you...
  84. N Y n=1 1 p 2 ⇡ ( 2 n

    + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, 2 n } | m, b, 2) =
  85. import numpy as np import emcee def lnprobfn(p, x, y,

    yerr): m, b, d2 = p if not 0 <= d2 <= 1: return -np.inf ivar = 1.0 / (yerr ** 2 + d2) chi2 = np.sum((y - m * x - b) ** 2 * ivar) return -0.5 * (chi2 - np.sum(np.log(ivar))) # Load data. # x, y, yerr = ... # Set up sampler and initialize the walkers. nwalkers, ndim = 100, 3 sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprobfn, args=(x, y, yerr)) p0 = [[np.random.rand(), 10 * np.random.rand(), np.random.rand()] for k in range(nwalkers)] # Go. sampler.run_mcmc(p0, 1000) m, b, d2 = sampler.flatchain.T
  86. import numpy as np def lnprobfn(p, x, y, yerr): m,

    b, d2 = p if not 0 <= d2 <= 1: return -np.inf ivar = 1.0 / (yerr ** 2 + d2) chi2 = np.sum((y - m * x - b) ** 2 * ivar) return -0.5 * (chi2 - np.sum(np.log(ivar))) p ( 2 ) = ⇢ 1 , if 0  2  1 0 , otherwise 2 = N X n=1 ( yn m xn + b )2 2 n + 2 ln p ({ xn, yn, n } | m, b, 2) = 1 2 2 1 2 N X n=1 ln 2 n + 2
  87. import emcee nwalkers, ndim = 100, 3 sampler = emcee.EnsembleSampler(nwalkers,

    ndim, lnprobfn, args=(x, y, yerr)) p0 = [[np.random.rand(), 10 * np.random.rand(), np.random.rand()] for k in range(nwalkers)] sampler.run_mcmc(p0, 1000) m, b, d2 = sampler.flatchain.T mk ⇠ U(0, 1) bk ⇠ U(0, 10) 2 k ⇠ U(0, 1) initialize each walker run MCMC for 1000 steps get the samples initialize the sampler
  88. burn-in

  89. None
  90. None
  91. convergence

  92. acceptance fraction?

  93. acceptance fraction? whoa!

  94. acceptance fraction? whoa! that's a lot of significant digits...

  95. dfm/acor 

  96. displaying results

  97. None
  98. None
  99. dfm/triangle.py 

  100. sharing results you need a number for your abstract?!?

  101. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

  102. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

    X = ¯ X+ + what does it MEAN?
  103. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

    X = ¯ X+ + what does it MEAN? 3 machine readable samplings (including priors values)
  104. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

    X = ¯ X+ + what does it MEAN? 3 machine readable samplings (including priors values) my pipe dream
  105. Brendon Brewer's words of wisdom...

  106. emcee isn't always The Right Choice™ Remember: Brendon Brewer's words

    of wisdom...
  107. what about multimodal densities? ✓ p(✓ | D)

  108. what about multimodal densities? ✓ p(✓ | D)

  109. what about multimodal densities? ✓ p(✓ | D)

  110. what about multimodal densities? ✓ p(✓ | D) p(accept)

  111. what about multimodal densities? ✓ p(✓ | D) p(accept)

  112. what if we want to compare models? p(✓ | D,

    M) = 1 Z(M) p(✓ | M) p(D | ✓, M) Z = p(D | M) = Z p(D, ✓ | M) d✓
  113. what if we want to compare models? p(✓ | D,

    M) = 1 Z(M) p(✓ | M) p(D | ✓, M) Z = p(D | M) = Z p(D, ✓ | M) d✓
  114. what to do?

  115. what to do? ✓ p(✓ | D) 1 burn-in &

    priors
  116. what to do? ✓ p(✓ | D) 1 burn-in &

    priors Z = p(D | M) = Z p(D, ✓ | M) d✓ 2 use a different algorithm (gasp!)
  117. what to do? ✓ p(✓ | D) 1 burn-in &

    priors Z = p(D | M) = Z p(D, ✓ | M) d✓ 2 use a different algorithm (gasp!) github.com/eggplantbren/DNest3
  118. sometimes it is useful

  119. emcee is community supported.

  120. emcee is community supported.

  121. emcee has good documentation.

  122. emcee has a live support team. danfm@nyu.edu

  123. Alex Conley (UC Boulder) Jason Davies (Jason Davies Ltd.) Will

    Meierjurgen Farr (Northwestern) David W. Hogg (NYU) Dustin Lang (CMU) Phil Marshall (Oxford) Ilya Pashchenko (ASC LPI, Moscow) Adrian Price-Whelan (Columbia) Jeremy Sanders (Cambridge) Joe Zuntz (Oxford) Eric Agol (UW) Jo Bovy (IAS) Jacqueline Chen (MIT) John Gizis (Delaware) Jonathan Goodman (NYU) Marius Millea (UC Davis) Jennifer Piscionere (Vanderbilt) contributors
  124. take home. p(D | ✓, ↵) your model:

  125. take home. p(D | ✓, ↵) it's hammer time! p(D

    | ✓) / Z p(↵) p(D | ✓, ↵) d↵ your model:
  126. take home. p(D | ✓, ↵) ✓ = it's hammer

    time! p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ your model:
  127. Alex Conley (UC Boulder) Jason Davies (Jason Davies Ltd.) Will

    Meierjurgen Farr (Northwestern) David W. Hogg (NYU) Dustin Lang (CMU) Phil Marshall (Oxford) Ilya Pashchenko (ASC LPI, Moscow) Adrian Price-Whelan (Columbia) Jeremy Sanders (Cambridge) Joe Zuntz (Oxford) Eric Agol (UW) Jo Bovy (IAS) Jacqueline Chen (MIT) John Gizis (Delaware) Jonathan Goodman (NYU) Marius Millea (UC Davis) Jennifer Piscionere (Vanderbilt) thanks!