pyastro16

 pyastro16

"Tools for probabilistic data analysis in Python" from the PyAstro16 conference

00c684a144d49f612a51e855eb326d6c?s=128

Dan Foreman-Mackey

March 23, 2016
Tweet

Transcript

  1. Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey

    | #pyastro16
  2. Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey

    | #pyastro16 * in 15 minutes
  3. What have I done?

  4. Tools for Probabilistic Data Analysis in Python

  5. Physics Data

  6. Physics Data mean model (physical parameters → predicted data)

  7. Physics Data mean model (physical parameters → predicted data) noise

    (stochastic; instrument, systematics, etc.)
  8. Physics Data mean model (physical parameters → predicted data) noise

    (stochastic; instrument, systematics, etc.) inference (parameter estimation)
  9. 1 2 3 linear regression maximum likelihood uncertainty quantification A

    few examples
  10. Linear regression

  11. Linear regression if you have: a linear mean model and

    known Gaussian uncertainties and you want: "best" parameters and uncertainties
  12. Linear (mean) models y = m x + b

  13. Linear (mean) models y = m x + b y

    = a2 x 2 + a1 x + a0
  14. Linear (mean) models y = m x + b y

    = a2 x 2 + a1 x + a0 y = a sin( x + w )
  15. Linear (mean) models y = m x + b y

    = a2 x 2 + a1 x + a0 y = a sin( x + w ) + known Gaussian uncertainties
  16. Linear regression

  17. Linear regression

  18. Linear regression # x, y, yerr are numpy arrays of

    the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))
  19. Linear regression # x, y, yerr are numpy arrays of

    the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A
  20. Linear regression # x, y, yerr are numpy arrays of

    the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆
  21. Linear regression # x, y, yerr are numpy arrays of

    the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) That's it! (in other words: "Don't use MCMC for linear regression!") A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆
  22. Maximum likelihood

  23. Maximum likelihood if you have: a non-linear mean model and/or

    non-Gaussian/unknown noise and you want: "best" parameters
  24. Likelihoods "probability of the data given physics" p(data | physics)

    parameterized by some parameters
  25. Example likelihood function " " 2 ln p( { yn

    } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant log-likelihood mean model
  26. Likelihoods

  27. Likelihoods SciPy

  28. # x, y, yerr are numpy arrays of the same

    shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant
  29. # x, y, yerr are numpy arrays of the same

    shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant
  30. # x, y, yerr are numpy arrays of the same

    shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p({yn } | ✓) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant
  31. "But it doesn't work…" — everyone

  32. 1 2 3 4 initialization bounds convergence gradients

  33. 1 2 3 4 initialization bounds convergence gradients

  34. Gradients d d✓ ln p({yn } | ✓)

  35. seriously?

  36. AutoDiff to the rescue! "The most criminally underused tool in

    the [PyAstro] toolkit" — adapted from justindomke.wordpress.com
  37. AutoDiff "Compile" time exact gradients

  38. AutoDiff "Compile" time chain rule

  39. AutoDiff GradType sin (GradType x): return GradType( x.value, x.grad *

    cos(x.value) )
  40. AutoDiff "Compile" time exact gradients

  41. AutoDiff in Python 1 2 Theano: deeplearning.net/software/theano HIPS/autograd: github.com/HIPS/autograd

  42. import autograd.numpy as np from autograd import elementwise_grad def f(x):

    y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works
  43. import autograd.numpy as np from autograd import elementwise_grad def f(x):

    y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works 4 2 0 2 4 x 1.0 0.5 0.0 0.5 1.0 f(x); f0(x); f00(x)
  44. before autograd # x, y, yerr are numpy arrays of

    the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5]) print(r)
  45. # x, y, yerr are numpy arrays of the same

    shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) after autograd
  46. # x, y, yerr are numpy arrays of the same

    shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) after autograd
  47. # x, y, yerr are numpy arrays of the same

    shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) 115 calls 66 calls after autograd
  48. HIPS/autograd just works but… HIPS/autograd is not super fast

  49. HIPS/autograd just works but… HIPS/autograd is not super fast you

    might need to drop down to a compiled language
  50. HIPS/autograd just works but… HIPS/autograd is not super fast you

    might need to drop down to a compiled language or...
  51. Use Julia?

  52. Uncertainty quantification

  53. Uncertainty quantification if you have: a non-linear mean model and/or

    non-Gaussian/unknown noise and you want: parameter uncertainties
  54. Uncertainty p(physics | data) / p(data | physics) p(physics) distribution

    of physical parameters consistent with data likelihood prior
  55. cbnd Flickr user Franz Jachim You're going to have to

    SAMPLE
  56. MCMC sampling

  57. MCMC sampling it's hammer time! emcee The MCMC Hammer

  58. MCMC sampling with emcee dfm.io/emcee; github.com/dfm/emcee

  59. MCMC sampling with emcee # x, y, yerr are numpy

    arrays of the same shape import emcee import numpy as np def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def log_prob(theta): log_prior = 0.0 r = (y - model(theta, x)) / yerr return -0.5 * np.sum(r*r) + log_prior ndim, nwalkers = 3, 32 p0 = np.array([1.0, 10.0, 1.5]) p0 = p0 + 0.01*np.random.randn(nwalkers, ndim) sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob) sampler.run_mcmc(p0, 1000)
  60. 8 10 12 14 b 0.975 1.000 1.025 1.050 a

    1.475 1.500 1.525 c 8 10 12 14 b 1.475 1.500 1.525 c MCMC sampling with emcee made using: github.com/dfm/corner.py f✓ ( xn ) = a 1 + e b ( xn c )
  61. 1 2 3 4 initialization bounds convergence gradients

  62. 1 2 3 4 initialization priors convergence gradients?

  63. Other MCMC samplers in Python 1 2 pymc-devs/pymc3 stan-dev/pystan 3

    JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4
  64. hierarchical inference Other MCMC samplers in Python 1 2 pymc-devs/pymc3

    stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4
  65. nested sampling hierarchical inference Other MCMC samplers in Python 1

    2 pymc-devs/pymc3 stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4
  66. in summary…

  67. Physics Data mean model (physical parameters → predicted data) noise

    (stochastic; instrument, systematics, etc.) inference (parameter estimation) If your data analysis problem looks like this… * * it probably does
  68. … now you know how to solve it! * *

    in theory https://speakerdeck.com/dfm/pyastro16