TensorFlow for astronomers

Slide 1

Slide 1 text

TensorFlow 101 for Astronomers A non-traditional introduction to model building with TensorFlow Dan Foreman-Mackey CCA@Flatiron / dfm.io / @exoplaneteer / github.com/dfm

Slide 2

Slide 2 text

WARNING ⚠ This is not a talk about Machine Learning™. If you're only here for the neural networks, please feel free to leave now <3

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

WARNING ⚠ I am not a TensorFlow expert. I only started learning a few months ago. That being said, I'm stoked.

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

So why am I here?

Slide 8

Slide 8 text

Blame Peter.

Slide 9

Slide 9 text

astronomy TensorFlow https://trends.google.com/trends/explore?date=2016-01-01%202018-04-01&q=TensorFlow,astronomy

Slide 10

Slide 10 text

I think that TensorFlow has the potential to be broadly used for data analysis problems all across astronomy. But seriously...

Slide 11

Slide 11 text

NumPy + special sauce, basically. What is TensorFlow? This is a very biased description...

Slide 12

Slide 12 text

NumPy + special sauce, basically. What is TensorFlow? This is a very biased description... Fast linear algebra, GPU support, automatic differentiation, ...

Slide 13

Slide 13 text

Introduce the why and how of TensorFlow. Demonstrate that TensorFlow isn't just for machine learning. Sketch the procedure for extending TensorFlow with custom C++ operations. My goals for today

Slide 14

Slide 14 text

GPU support for some operations (esp. linear algebra). Extensible with custom C/C++ code. Simple (Python) interface for ﬂexible but high-performance model building + gradients. Did I mention GRADIENTS?!? Why should you care?

Slide 15

Slide 15 text

All of science p(data | model)

Slide 16

Slide 16 text

All of science p(data | model) Then, use some numerical algorithm to ﬁt for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc.

Slide 17

Slide 17 text

All of science p(data | model) A lot of code... Then, use some numerical algorithm to ﬁt for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc.

Slide 18

Slide 18 text

All of science Then, use some numerical algorithm to ﬁt for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc. d p(data | model) d model A lot of code...

Slide 19

Slide 19 text

All of science OMG SO MANY CODEZ!!!!! Then, use some numerical algorithm to ﬁt for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc. d p(data | model) d model

Slide 20

Slide 20 text

Now what if you want to change the parameterization?

Slide 21

Slide 21 text

Simple Python interface, but C speed (or faster) easy to implement/debug, fits into existing Python stack easily, etc. Automatic differentiation of models no need to manually compute gradients, even if parameterization changes Extensibility and customization you can incorporate your existing C code (with some caveats) TensorFlow offers

Slide 22

Slide 22 text

Automatic diﬀerentiation An aside

Slide 23

Slide 23 text

y = f(x) your computer program

Slide 24

Slide 24 text

schematically..... y = f1(f2(· · · fN (x) · · · ))

Slide 25

Slide 25 text

y0 =f1 0(f2(· · · fN (x) · · · )) f2 0(· · · fN (x) · · · ) · · · fN 0(x) hint: this is the chain rule

Slide 26

Slide 26 text

The chain rule + compiled code... ...leads to eﬃcient calculation of gradients complicated models. Automatic diﬀerentiation

Slide 27

Slide 27 text

Linear regression A simple example

Slide 28

Slide 28 text

pip install tensorﬂow To start

Slide 29

Slide 29 text

import numpy as np ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = np.dot(A.T, A*ivar[:, None]) ATy = np.dot(A.T, y*ivar) w = np.linalg.solve(ATA, ATy) print(w) Linear regression

Slide 30

Slide 30 text

import numpy as np ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = np.dot(A.T, A*ivar[:, None]) ATy = np.dot(A.T, y*ivar) w = np.linalg.solve(ATA, ATy) print(w) w = (AT C 1 A) 1 (AT C 1 y) Linear regression

Slide 31

Slide 31 text

import tensorflow as tf ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) Linear regression

Slide 32

Slide 32 text

Slide 33

Slide 33 text

deﬁne operation "graph" import tensorflow as tf ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) >>> print(w) Tensor("MatrixSolve_12:0", shape=(2, 1), dtype=float64) Linear regression

Slide 34

Slide 34 text

execute the operations deﬁne operation "graph" import tensorflow as tf ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) >>> print(w) Tensor("MatrixSolve_12:0", shape=(2, 1), dtype=float64) Linear regression

Slide 35

Slide 35 text

Linear regression with unknown error bars Another simple example L = 1 2 N X n=1  (yn m xn b)2 s2 + log(2 ⇡ s2)

Slide 36

Slide 36 text

m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s = tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) L = 1 2 N X n=1  (yn m xn b)2 s2 + log(2 ⇡ s2)

Slide 37

Slide 37 text

m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s = tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like))

Slide 38

Slide 38 text

m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s = tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like)) print(session.run(tf.gradients(log_like, [m, b, s])))

Slide 39

Slide 39 text

m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) log_s = tf.Variable(np.log(0.1), dtype=tf.float64) s = tf.exp(log_s) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like)) print(session.run(tf.gradients(log_like, [m, b, log_s])))

Slide 40

Slide 40 text

opt = tf.contrib.opt.ScipyOptimizerInterface( -log_like, [m, b, log_s]) opt.minimize(sess) INFO:tensorflow:Optimization terminated with: Message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' Objective function value: -20.356501 Number of iterations: 19 Number of functions evaluations: 21 To ﬁt...

Slide 41

Slide 41 text

You already have code written in C/C++/Fortran/etc. You can use that inside of TensorFlow (with only a little boilerplate). Extending TensorFlow

Slide 42

Slide 42 text

A solver for Kepler's equation... Extending TensorFlow

Slide 43

Slide 43 text

https://github.com/dfm/AstroFlow/blob/master/astroﬂow/ops/kepler_op.cc template class KeplerOp : public OpKernel { public: explicit KeplerOp(OpKernelConstruction* context) : OpKernel(context) { OP_REQUIRES_OK(context, context->GetAttr("maxiter", &maxiter_)); OP_REQUIRES(context, maxiter_ >= 0, errors::InvalidArgument("Need maxiter >= 0, got ", maxiter_)); OP_REQUIRES_OK(context, context->GetAttr("tol", &tol_)); } void Compute(OpKernelContext* context) override { const Tensor& M_tensor = context->input(0); const Tensor& e_tensor = context->input(1); const int64 N = M_tensor.NumElements(); Tensor* E_tensor = NULL; OP_REQUIRES_OK(context, context->allocate_output(0, M_tensor.shape(), &E_tensor)); const auto M = M_tensor.template flat(); const auto e = e_tensor.template scalar()(0); auto E = E_tensor->template flat(); for (int64 n = 0; n < N; ++n) { E(n) = kepler(M(n), e, maxiter_, tol_); } } private: int maxiter_; float tol_; };

Slide 44

Slide 44 text

Slide 45

Slide 45 text

What about gradients...?

Slide 46

Slide 46 text

https://github.com/dfm/AstroFlow/blob/master/astroﬂow/astroﬂow.py @tf.RegisterGradient("Kepler") def _kepler_grad(op, *grads): M, e = op.inputs E = op.outputs[0] bE = grads[0] bM = bE / (1.0 - e * tf.cos(E)) be = tf.reduce_sum(tf.sin(E) * bM) return [bM, be]

Slide 47

Slide 47 text

Introduce the why and how of TensorFlow. Demonstrate that TensorFlow isn't just for machine learning. Sketch the procedure for extending TensorFlow with custom C++ operations. My goals for today

Slide 48

Slide 48 text

github.com/dfm/tf-tutorial Up next...