Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data analysis with MCMC

Data analysis with MCMC

Dan Foreman-Mackey

April 02, 2013
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. I work on the Local Group exoplanets variable stars image

    modeling calibration not writing papers
  2. I work on the Local Group exoplanets variable stars image

    modeling calibration not writing papers
  3. Physics Data The graphical model of my research. a sketch

    of the generative view of data analysis
  4. 2 = N X n=1 [ yn ( m xn

    + b )]2 2 n a line ˆ yn = m xn + b + ✏n ✏n ⇠ N(0; 2 n ) synthetic data: noise model: p( { yn, xn, 2 n } | m, b) / exp ✓ 1 2 2 ◆ for example
  5. p(data | physics) likelihood function/generative model p(physics | data) /

    p(physics) p(data | physics) posterior probability
  6. what if we underestimated our error bars? N Y n=1

    1 p 2 ⇡ ( 2 n + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, 2 n } | m, b, 2) = "physics" "jitter" (not physics) (by some additive variance)
  7. p(D | ✓) / Z p(↵) p(D | ✓, ↵)

    d↵ marginalize. Do The Right Thing™
  8. Z d 2 p( 2 ) N Y n=1 1

    p 2 ⇡ ( 2 n + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, n } | m, b ) = in our example BOOM!
  9. What do you really want? Ep[f(✓)] = 1 Z Z

    f(✓) p(✓) p(D | ✓) d✓
  10. probably. What do you really want? Ep[f(✓)] = 1 Z

    Z f(✓) p(✓) p(D | ✓) d✓
  11. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation
  12. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation large number of dimensions
  13. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation large number of dimensions whoa!
  14. Ep[f(✓)] = 1 Z Z f(✓) p(✓) p(D | ✓)

    d✓ p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ marginalization expectation This is HARD. (in general) large number of dimensions whoa!
  15. Z f( x ) p( x ) d x ⇡

    1 N N X n=1 f( xn) xn ⇠ p( x ) where: error: number of independent samples as you learned in middle school / 1 p N0
  16. Z f( x ) p( x ) d x ⇡

    1 N N X n=1 f( xn) xn ⇠ p( x ) where: error: number of independent samples as you learned in middle school / 1 p N0
  17. MCMC draws samples from a probability function and all you

    need to be able to do is evaluate the function (up to a constant)
  18. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  19. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) definitely. accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  20. propose a new position Metropolis–Hastings in an ideal world x

    0 ⇠ q( x 0; x ) definitely. accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆ only relative probabilities
  21. Metropolis–Hastings in an ideal world x 0 ⇠ q( x

    0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  22. not this time. Metropolis–Hastings in an ideal world x 0

    ⇠ q( x 0; x ) accept? p(accept) = min ✓ 1, p( x ) p( x 0) q( x ; x 0) q( x 0; x ) ◆
  23. Metropolis–Hastings in the real world D(D+1)/2 tuning parameters that's the

    dimension of your problem Scientific Awesomeness how hard is MCMC Metropolis Hastings how things Should be (~number of parameters)
  24. choose a helper Ensemble Samplers in the real world accept?

    p(accept) = min ✓ 1, ZD 1 p( x ) p( x 0) ◆ propose a new position
  25. N Y n=1 1 p 2 ⇡ ( 2 n

    + 2 ) exp  1 2 [yn (m xn + b)] 2 2 n + 2 p ({ yn, xn, 2 n } | m, b, 2) =
  26. import numpy as np import emcee def lnprobfn(p, x, y,

    yerr): m, b, d2 = p if not 0 <= d2 <= 1: return -np.inf ivar = 1.0 / (yerr ** 2 + d2) chi2 = np.sum((y - m * x - b) ** 2 * ivar) return -0.5 * (chi2 - np.sum(np.log(ivar))) # Load data. # x, y, yerr = ... # Set up sampler and initialize the walkers. nwalkers, ndim = 100, 3 sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprobfn, args=(x, y, yerr)) p0 = [[np.random.rand(), 10 * np.random.rand(), np.random.rand()] for k in range(nwalkers)] # Go. sampler.run_mcmc(p0, 1000) m, b, d2 = sampler.flatchain.T
  27. import numpy as np def lnprobfn(p, x, y, yerr): m,

    b, d2 = p if not 0 <= d2 <= 1: return -np.inf ivar = 1.0 / (yerr ** 2 + d2) chi2 = np.sum((y - m * x - b) ** 2 * ivar) return -0.5 * (chi2 - np.sum(np.log(ivar))) p ( 2 ) = ⇢ 1 , if 0  2  1 0 , otherwise 2 = N X n=1 ( yn m xn + b )2 2 n + 2 ln p ({ xn, yn, n } | m, b, 2) = 1 2 2 1 2 N X n=1 ln 2 n + 2
  28. import emcee nwalkers, ndim = 100, 3 sampler = emcee.EnsembleSampler(nwalkers,

    ndim, lnprobfn, args=(x, y, yerr)) p0 = [[np.random.rand(), 10 * np.random.rand(), np.random.rand()] for k in range(nwalkers)] sampler.run_mcmc(p0, 1000) m, b, d2 = sampler.flatchain.T mk ⇠ U(0, 1) bk ⇠ U(0, 10) 2 k ⇠ U(0, 1) initialize each walker run MCMC for 1000 steps get the samples initialize the sampler
  29. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

    X = ¯ X+ + what does it MEAN? 3 machine readable samplings (including priors values)
  30. 1 sort the samples 2 compute moments/quantiles (0.16, 0.5, 084)

    X = ¯ X+ + what does it MEAN? 3 machine readable samplings (including priors values) my pipe dream
  31. what if we want to compare models? p(✓ | D,

    M) = 1 Z(M) p(✓ | M) p(D | ✓, M) Z = p(D | M) = Z p(D, ✓ | M) d✓
  32. what if we want to compare models? p(✓ | D,

    M) = 1 Z(M) p(✓ | M) p(D | ✓, M) Z = p(D | M) = Z p(D, ✓ | M) d✓
  33. what to do? ✓ p(✓ | D) 1 burn-in &

    priors Z = p(D | M) = Z p(D, ✓ | M) d✓ 2 use a different algorithm (gasp!)
  34. what to do? ✓ p(✓ | D) 1 burn-in &

    priors Z = p(D | M) = Z p(D, ✓ | M) d✓ 2 use a different algorithm (gasp!) github.com/eggplantbren/DNest3
  35. Alex Conley (UC Boulder) Jason Davies (Jason Davies Ltd.) Will

    Meierjurgen Farr (Northwestern) David W. Hogg (NYU) Dustin Lang (CMU) Phil Marshall (Oxford) Ilya Pashchenko (ASC LPI, Moscow) Adrian Price-Whelan (Columbia) Jeremy Sanders (Cambridge) Joe Zuntz (Oxford) Eric Agol (UW) Jo Bovy (IAS) Jacqueline Chen (MIT) John Gizis (Delaware) Jonathan Goodman (NYU) Marius Millea (UC Davis) Jennifer Piscionere (Vanderbilt) contributors
  36. take home. p(D | ✓, ↵) it's hammer time! p(D

    | ✓) / Z p(↵) p(D | ✓, ↵) d↵ your model:
  37. take home. p(D | ✓, ↵) ✓ = it's hammer

    time! p(D | ✓) / Z p(↵) p(D | ✓, ↵) d↵ your model:
  38. Alex Conley (UC Boulder) Jason Davies (Jason Davies Ltd.) Will

    Meierjurgen Farr (Northwestern) David W. Hogg (NYU) Dustin Lang (CMU) Phil Marshall (Oxford) Ilya Pashchenko (ASC LPI, Moscow) Adrian Price-Whelan (Columbia) Jeremy Sanders (Cambridge) Joe Zuntz (Oxford) Eric Agol (UW) Jo Bovy (IAS) Jacqueline Chen (MIT) John Gizis (Delaware) Jonathan Goodman (NYU) Marius Millea (UC Davis) Jennifer Piscionere (Vanderbilt) thanks!