pyastro16

Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey
| #pyastro16

Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey
| #pyastro16 * in 15 minutes

What have I done?

Tools for Probabilistic Data Analysis in Python

Physics Data

Physics Data mean model (physical parameters → predicted data)

Physics Data mean model (physical parameters → predicted data) noise
(stochastic; instrument, systematics, etc.)

(stochastic; instrument, systematics, etc.) inference (parameter estimation)

1 2 3 linear regression maximum likelihood uncertainty quantiﬁcation A
few examples

Linear regression

Linear regression if you have: a linear mean model and
known Gaussian uncertainties and you want: "best" parameters and uncertainties

Linear (mean) models y = m x + b

Linear (mean) models y = m x + b y
= a2 x 2 + a1 x + a0

= a2 x 2 + a1 x + a0 y = a sin( x + w )

= a2 x 2 + a1 x + a0 y = a sin( x + w ) + known Gaussian uncertainties

Linear regression

Linear regression # x, y, yerr are numpy arrays of
the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))

the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A

the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆

the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) That's it! (in other words: "Don't use MCMC for linear regression!") A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆

Maximum likelihood

Maximum likelihood if you have: a non-linear mean model and/or
non-Gaussian/unknown noise and you want: "best" parameters

Likelihoods "probability of the data given physics" p(data | physics)
parameterized by some parameters

Example likelihood function " " 2 ln p( { yn
} | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant log-likelihood mean model

Likelihoods

Likelihoods SciPy

# x, y, yerr are numpy arrays of the same
shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p({yn } | ✓) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

"But it doesn't work…" — everyone

1 2 3 4 initialization bounds convergence gradients

Gradients d d✓ ln p({yn } | ✓)

seriously?

AutoDiff to the rescue! "The most criminally underused tool in
the [PyAstro] toolkit" — adapted from justindomke.wordpress.com

AutoDiff "Compile" time exact gradients

AutoDiff "Compile" time chain rule

AutoDiff GradType sin (GradType x): return GradType( x.value, x.grad *
cos(x.value) )

AutoDiff "Compile" time exact gradients

AutoDiff in Python 1 2 Theano: deeplearning.net/software/theano HIPS/autograd: github.com/HIPS/autograd

import autograd.numpy as np from autograd import elementwise_grad def f(x):
y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works

import autograd.numpy as np from autograd import elementwise_grad def f(x):
y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works 4 2 0 2 4 x 1.0 0.5 0.0 0.5 1.0 f(x); f0(x); f00(x)

before autograd # x, y, yerr are numpy arrays of
the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5]) print(r)

shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) after autograd

shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) 115 calls 66 calls after autograd

HIPS/autograd just works but… HIPS/autograd is not super fast

HIPS/autograd just works but… HIPS/autograd is not super fast you
might need to drop down to a compiled language

HIPS/autograd just works but… HIPS/autograd is not super fast you
might need to drop down to a compiled language or...

Use Julia?

Uncertainty quantiﬁcation

Uncertainty quantiﬁcation if you have: a non-linear mean model and/or
non-Gaussian/unknown noise and you want: parameter uncertainties

Uncertainty p(physics | data) / p(data | physics) p(physics) distribution
of physical parameters consistent with data likelihood prior

cbnd Flickr user Franz Jachim You're going to have to
SAMPLE

MCMC sampling

MCMC sampling it's hammer time! emcee The MCMC Hammer

MCMC sampling with emcee dfm.io/emcee; github.com/dfm/emcee

MCMC sampling with emcee # x, y, yerr are numpy
arrays of the same shape import emcee import numpy as np def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def log_prob(theta): log_prior = 0.0 r = (y - model(theta, x)) / yerr return -0.5 * np.sum(r*r) + log_prior ndim, nwalkers = 3, 32 p0 = np.array([1.0, 10.0, 1.5]) p0 = p0 + 0.01*np.random.randn(nwalkers, ndim) sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob) sampler.run_mcmc(p0, 1000)

8 10 12 14 b 0.975 1.000 1.025 1.050 a
1.475 1.500 1.525 c 8 10 12 14 b 1.475 1.500 1.525 c MCMC sampling with emcee made using: github.com/dfm/corner.py f✓ ( xn ) = a 1 + e b ( xn c )

1 2 3 4 initialization bounds convergence gradients

1 2 3 4 initialization priors convergence gradients?

Other MCMC samplers in Python 1 2 pymc-devs/pymc3 stan-dev/pystan 3
JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

hierarchical inference Other MCMC samplers in Python 1 2 pymc-devs/pymc3
stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

nested sampling hierarchical inference Other MCMC samplers in Python 1
2 pymc-devs/pymc3 stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

in summary…

(stochastic; instrument, systematics, etc.) inference (parameter estimation) If your data analysis problem looks like this… * * it probably does

… now you know how to solve it! * *
in theory https://speakerdeck.com/dfm/pyastro16

pyastro16

pyastro16

More Decks by Dan Foreman-Mackey

Other Decks in Science

Featured

Transcript