Slide 1

Slide 1 text

Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey | #pyastro16

Slide 2

Slide 2 text

Tools for Probabilistic Data Analysis in Python * Dan Foreman-Mackey | #pyastro16 * in 15 minutes

Slide 3

Slide 3 text

What have I done?

Slide 4

Slide 4 text

Tools for Probabilistic Data Analysis in Python

Slide 5

Slide 5 text

Physics Data

Slide 6

Slide 6 text

Physics Data mean model (physical parameters → predicted data)

Slide 7

Slide 7 text

Physics Data mean model (physical parameters → predicted data) noise (stochastic; instrument, systematics, etc.)

Slide 8

Slide 8 text

Physics Data mean model (physical parameters → predicted data) noise (stochastic; instrument, systematics, etc.) inference (parameter estimation)

Slide 9

Slide 9 text

1 2 3 linear regression maximum likelihood uncertainty quantification A few examples

Slide 10

Slide 10 text

Linear regression

Slide 11

Slide 11 text

Linear regression if you have: a linear mean model and known Gaussian uncertainties and you want: "best" parameters and uncertainties

Slide 12

Slide 12 text

Linear (mean) models y = m x + b

Slide 13

Slide 13 text

Linear (mean) models y = m x + b y = a2 x 2 + a1 x + a0

Slide 14

Slide 14 text

Linear (mean) models y = m x + b y = a2 x 2 + a1 x + a0 y = a sin( x + w )

Slide 15

Slide 15 text

Linear (mean) models y = m x + b y = a2 x 2 + a1 x + a0 y = a sin( x + w ) + known Gaussian uncertainties

Slide 16

Slide 16 text

Linear regression

Slide 17

Slide 17 text

Linear regression

Slide 18

Slide 18 text

Linear regression # x, y, yerr are numpy arrays of the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2))

Slide 19

Slide 19 text

Linear regression # x, y, yerr are numpy arrays of the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A

Slide 20

Slide 20 text

Linear regression # x, y, yerr are numpy arrays of the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆

Slide 21

Slide 21 text

Linear regression # x, y, yerr are numpy arrays of the same shape import numpy as np A = np.vander(x, 2) ATA = np.dot(A.T, A / yerr[:, None]**2) sigma_w = np.linalg.inv(ATA) mean_w = np.linalg.solve(ATA, np.dot(A.T, y / yerr**2)) That's it! (in other words: "Don't use MCMC for linear regression!") A = 0 B B B @ x1 1 x2 1 . . . . . . xn 1 1 C C C A w = ✓ m b ◆

Slide 22

Slide 22 text

Maximum likelihood

Slide 23

Slide 23 text

Maximum likelihood if you have: a non-linear mean model and/or non-Gaussian/unknown noise and you want: "best" parameters

Slide 24

Slide 24 text

Likelihoods "probability of the data given physics" p(data | physics) parameterized by some parameters

Slide 25

Slide 25 text

Example likelihood function " " 2 ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant log-likelihood mean model

Slide 26

Slide 26 text

Likelihoods

Slide 27

Slide 27 text

Likelihoods SciPy

Slide 28

Slide 28 text

# x, y, yerr are numpy arrays of the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

Slide 29

Slide 29 text

# x, y, yerr are numpy arrays of the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

Slide 30

Slide 30 text

# x, y, yerr are numpy arrays of the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): return 0.5 * np.sum(((model(theta, x) - y) / yerr)**2) r = minimize(nll, [1.0, 10.0, 1.5]) print(r) Likelihoods f✓ ( xn ) = a 1 + e b ( xn c ) ln p({yn } | ✓) ln p( { yn } | ✓) = 1 2 N X n=1 [yn f✓(xn)] 2 n 2 + constant

Slide 31

Slide 31 text

"But it doesn't work…" — everyone

Slide 32

Slide 32 text

1 2 3 4 initialization bounds convergence gradients

Slide 33

Slide 33 text

1 2 3 4 initialization bounds convergence gradients

Slide 34

Slide 34 text

Gradients d d✓ ln p({yn } | ✓)

Slide 35

Slide 35 text

seriously?

Slide 36

Slide 36 text

AutoDiff to the rescue! "The most criminally underused tool in the [PyAstro] toolkit" — adapted from justindomke.wordpress.com

Slide 37

Slide 37 text

AutoDiff "Compile" time exact gradients

Slide 38

Slide 38 text

AutoDiff "Compile" time chain rule

Slide 39

Slide 39 text

AutoDiff GradType sin (GradType x): return GradType( x.value, x.grad * cos(x.value) )

Slide 40

Slide 40 text

AutoDiff "Compile" time exact gradients

Slide 41

Slide 41 text

AutoDiff in Python 1 2 Theano: deeplearning.net/software/theano HIPS/autograd: github.com/HIPS/autograd

Slide 42

Slide 42 text

import autograd.numpy as np from autograd import elementwise_grad def f(x): y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works

Slide 43

Slide 43 text

import autograd.numpy as np from autograd import elementwise_grad def f(x): y = np.exp(-x) return (1.0 - y) / (1.0 + y) df = elementwise_grad(f) ddf = elementwise_grad(df) HIPS/autograd just works 4 2 0 2 4 x 1.0 0.5 0.0 0.5 1.0 f(x); f0(x); f00(x)

Slide 44

Slide 44 text

before autograd # x, y, yerr are numpy arrays of the same shape import numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5]) print(r)

Slide 45

Slide 45 text

# x, y, yerr are numpy arrays of the same shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) after autograd

Slide 46

Slide 46 text

# x, y, yerr are numpy arrays of the same shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) after autograd

Slide 47

Slide 47 text

# x, y, yerr are numpy arrays of the same shape from autograd import grad import autograd.numpy as np from scipy.optimize import minimize def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def neg_log_like(theta): r = (y - model(theta, x)) / yerr return 0.5 * np.sum(r*r) r = minimize(neg_log_like, [1.0, 10.0, 1.5], jac=grad(neg_log_like)) print(r) 115 calls 66 calls after autograd

Slide 48

Slide 48 text

HIPS/autograd just works but… HIPS/autograd is not super fast

Slide 49

Slide 49 text

HIPS/autograd just works but… HIPS/autograd is not super fast you might need to drop down to a compiled language

Slide 50

Slide 50 text

HIPS/autograd just works but… HIPS/autograd is not super fast you might need to drop down to a compiled language or...

Slide 51

Slide 51 text

Use Julia?

Slide 52

Slide 52 text

Uncertainty quantification

Slide 53

Slide 53 text

Uncertainty quantification if you have: a non-linear mean model and/or non-Gaussian/unknown noise and you want: parameter uncertainties

Slide 54

Slide 54 text

Uncertainty p(physics | data) / p(data | physics) p(physics) distribution of physical parameters consistent with data likelihood prior

Slide 55

Slide 55 text

cbnd Flickr user Franz Jachim You're going to have to SAMPLE

Slide 56

Slide 56 text

MCMC sampling

Slide 57

Slide 57 text

MCMC sampling it's hammer time! emcee The MCMC Hammer

Slide 58

Slide 58 text

MCMC sampling with emcee dfm.io/emcee; github.com/dfm/emcee

Slide 59

Slide 59 text

MCMC sampling with emcee # x, y, yerr are numpy arrays of the same shape import emcee import numpy as np def model(theta, x): a, b, c = theta return a / (1 + np.exp(-b * (x - c))) def log_prob(theta): log_prior = 0.0 r = (y - model(theta, x)) / yerr return -0.5 * np.sum(r*r) + log_prior ndim, nwalkers = 3, 32 p0 = np.array([1.0, 10.0, 1.5]) p0 = p0 + 0.01*np.random.randn(nwalkers, ndim) sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob) sampler.run_mcmc(p0, 1000)

Slide 60

Slide 60 text

8 10 12 14 b 0.975 1.000 1.025 1.050 a 1.475 1.500 1.525 c 8 10 12 14 b 1.475 1.500 1.525 c MCMC sampling with emcee made using: github.com/dfm/corner.py f✓ ( xn ) = a 1 + e b ( xn c )

Slide 61

Slide 61 text

1 2 3 4 initialization bounds convergence gradients

Slide 62

Slide 62 text

1 2 3 4 initialization priors convergence gradients?

Slide 63

Slide 63 text

Other MCMC samplers in Python 1 2 pymc-devs/pymc3 stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

Slide 64

Slide 64 text

hierarchical inference Other MCMC samplers in Python 1 2 pymc-devs/pymc3 stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

Slide 65

Slide 65 text

nested sampling hierarchical inference Other MCMC samplers in Python 1 2 pymc-devs/pymc3 stan-dev/pystan 3 JohannesBuchner/PyMultiNest 4 eggplantbren/DNest4

Slide 66

Slide 66 text

in summary…

Slide 67

Slide 67 text

Physics Data mean model (physical parameters → predicted data) noise (stochastic; instrument, systematics, etc.) inference (parameter estimation) If your data analysis problem looks like this… * * it probably does

Slide 68

Slide 68 text

… now you know how to solve it! * * in theory https://speakerdeck.com/dfm/pyastro16