Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TensorFlow for astronomers

TensorFlow for astronomers

Dan Foreman-Mackey

April 06, 2018
Tweet

More Decks by Dan Foreman-Mackey

Other Decks in Science

Transcript

  1. TensorFlow 101 for Astronomers A non-traditional introduction to model building

    with TensorFlow Dan Foreman-Mackey CCA@Flatiron / dfm.io / @exoplaneteer / github.com/dfm
  2. WARNING ⚠ This is not a talk about Machine Learning™.

    If you're only here for the neural networks, please feel free to leave now <3
  3. WARNING ⚠ I am not a TensorFlow expert. I only

    started learning a few months ago. That being said, I'm stoked.
  4. I think that TensorFlow has the potential to be broadly

    used for data analysis problems all across astronomy. But seriously...
  5. NumPy + special sauce, basically. What is TensorFlow? This is

    a very biased description... Fast linear algebra, GPU support, automatic differentiation, ...
  6. Introduce the why and how of TensorFlow. Demonstrate that TensorFlow

    isn't just for machine learning. Sketch the procedure for extending TensorFlow with custom C++ operations. My goals for today
  7. GPU support for some operations (esp. linear algebra). Extensible with

    custom C/C++ code. Simple (Python) interface for flexible but high-performance model building + gradients. Did I mention GRADIENTS?!? Why should you care?
  8. All of science p(data | model) Then, use some numerical

    algorithm to fit for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc.
  9. All of science p(data | model) A lot of code...

    Then, use some numerical algorithm to fit for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc.
  10. All of science Then, use some numerical algorithm to fit

    for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc. d p(data | model) d model A lot of code...
  11. All of science OMG SO MANY CODEZ!!!!! Then, use some

    numerical algorithm to fit for the model parameters... e.g. scipy.optimize.minimize, MCMC, etc. d p(data | model) d model
  12. Simple Python interface, but C speed (or faster) easy to

    implement/debug, fits into existing Python stack easily, etc. Automatic differentiation of models no need to manually compute gradients, even if parameterization changes Extensibility and customization you can incorporate your existing C code (with some caveats) TensorFlow offers
  13. y0 =f1 0(f2(· · · fN (x) · · ·

    )) f2 0(· · · fN (x) · · · ) · · · fN 0(x) hint: this is the chain rule
  14. The chain rule + compiled code... ...leads to efficient calculation

    of gradients complicated models. Automatic differentiation
  15. import numpy as np ivar = 1.0 / yerr**2 A

    = np.vander(x, 2) ATA = np.dot(A.T, A*ivar[:, None]) ATy = np.dot(A.T, y*ivar) w = np.linalg.solve(ATA, ATy) print(w) Linear regression
  16. import numpy as np ivar = 1.0 / yerr**2 A

    = np.vander(x, 2) ATA = np.dot(A.T, A*ivar[:, None]) ATy = np.dot(A.T, y*ivar) w = np.linalg.solve(ATA, ATy) print(w) w = (AT C 1 A) 1 (AT C 1 y) Linear regression
  17. import tensorflow as tf ivar = 1.0 / yerr**2 A

    = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) Linear regression
  18. import tensorflow as tf ivar = 1.0 / yerr**2 A

    = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) >>> print(w) Tensor("MatrixSolve_12:0", shape=(2, 1), dtype=float64) Linear regression
  19. define operation "graph" import tensorflow as tf ivar = 1.0

    / yerr**2 A = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) >>> print(w) Tensor("MatrixSolve_12:0", shape=(2, 1), dtype=float64) Linear regression
  20. execute the operations define operation "graph" import tensorflow as tf

    ivar = 1.0 / yerr**2 A = np.vander(x, 2) ATA = tf.matmul(A, A*ivar[:, None], transpose_a=True) ATy = tf.matmul(A, (y*ivar)[:, None], transpose_a=True) w = tf.linalg.solve(ATA, ATy) print(tf.Session().run(w)) >>> print(w) Tensor("MatrixSolve_12:0", shape=(2, 1), dtype=float64) Linear regression
  21. Linear regression with unknown error bars Another simple example L

    = 1 2 N X n=1  (yn m xn b)2 s2 + log(2 ⇡ s2)
  22. m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s =

    tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) L = 1 2 N X n=1  (yn m xn b)2 s2 + log(2 ⇡ s2)
  23. m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s =

    tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like))
  24. m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) s =

    tf.Variable(0.1, dtype=tf.float64) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like)) print(session.run(tf.gradients(log_like, [m, b, s])))
  25. m = tf.Variable(0.5, dtype=tf.float64) b = tf.Variable(-0.2, dtype=tf.float64) log_s =

    tf.Variable(np.log(0.1), dtype=tf.float64) s = tf.exp(log_s) model = m * x + b log_like = -0.5 * tf.reduce_sum(((y - model) / s) ** 2) log_like -= 0.5 * len(x) * tf.log(2*np.pi*s**2) with tf.Session() as session: session.run(tf.global_variables_initializer()) print(session.run(log_like)) print(session.run(tf.gradients(log_like, [m, b, log_s])))
  26. opt = tf.contrib.opt.ScipyOptimizerInterface( -log_like, [m, b, log_s]) opt.minimize(sess) INFO:tensorflow:Optimization terminated

    with: Message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' Objective function value: -20.356501 Number of iterations: 19 Number of functions evaluations: 21 To fit...
  27. You already have code written in C/C++/Fortran/etc. You can use

    that inside of TensorFlow (with only a little boilerplate). Extending TensorFlow
  28. https://github.com/dfm/AstroFlow/blob/master/astroflow/ops/kepler_op.cc template <typename T> class KeplerOp : public OpKernel {

    public: explicit KeplerOp(OpKernelConstruction* context) : OpKernel(context) { OP_REQUIRES_OK(context, context->GetAttr("maxiter", &maxiter_)); OP_REQUIRES(context, maxiter_ >= 0, errors::InvalidArgument("Need maxiter >= 0, got ", maxiter_)); OP_REQUIRES_OK(context, context->GetAttr("tol", &tol_)); } void Compute(OpKernelContext* context) override { const Tensor& M_tensor = context->input(0); const Tensor& e_tensor = context->input(1); const int64 N = M_tensor.NumElements(); Tensor* E_tensor = NULL; OP_REQUIRES_OK(context, context->allocate_output(0, M_tensor.shape(), &E_tensor)); const auto M = M_tensor.template flat<T>(); const auto e = e_tensor.template scalar<T>()(0); auto E = E_tensor->template flat<T>(); for (int64 n = 0; n < N; ++n) { E(n) = kepler<T>(M(n), e, maxiter_, tol_); } } private: int maxiter_; float tol_; };
  29. https://github.com/dfm/AstroFlow/blob/master/astroflow/ops/kepler_op.cc template <typename T> class KeplerOp : public OpKernel {

    public: explicit KeplerOp(OpKernelConstruction* context) : OpKernel(context) { OP_REQUIRES_OK(context, context->GetAttr("maxiter", &maxiter_)); OP_REQUIRES(context, maxiter_ >= 0, errors::InvalidArgument("Need maxiter >= 0, got ", maxiter_)); OP_REQUIRES_OK(context, context->GetAttr("tol", &tol_)); } void Compute(OpKernelContext* context) override { const Tensor& M_tensor = context->input(0); const Tensor& e_tensor = context->input(1); const int64 N = M_tensor.NumElements(); Tensor* E_tensor = NULL; OP_REQUIRES_OK(context, context->allocate_output(0, M_tensor.shape(), &E_tensor)); const auto M = M_tensor.template flat<T>(); const auto e = e_tensor.template scalar<T>()(0); auto E = E_tensor->template flat<T>(); for (int64 n = 0; n < N; ++n) { E(n) = kepler<T>(M(n), e, maxiter_, tol_); } } private: int maxiter_; float tol_; };
  30. https://github.com/dfm/AstroFlow/blob/master/astroflow/astroflow.py @tf.RegisterGradient("Kepler") def _kepler_grad(op, *grads): M, e = op.inputs E

    = op.outputs[0] bE = grads[0] bM = bE / (1.0 - e * tf.cos(E)) be = tf.reduce_sum(tf.sin(E) * bM) return [bM, be]
  31. Introduce the why and how of TensorFlow. Demonstrate that TensorFlow

    isn't just for machine learning. Sketch the procedure for extending TensorFlow with custom C++ operations. My goals for today