JAX: Accelerated Machine Learning Research via Composable Function Transformations in Python (Matt Johnson, Google Brain)

JAX: accelerating ML with composable transformations Matthew Johnson (mattjj@google.com) on
behalf of the JAX team

How might you implement a deep neural network from scratch
in Python?

Motivating JAX import numpy as np def predict(params, inputs): for
W, b in params: outputs = np.dot(inputs, W) + b inputs = np.tanh(outputs) return outputs

Motivating JAX import numpy as np def predict(params, inputs): for
W, b in params: outputs = np.dot(inputs, W) + b inputs = np.tanh(outputs) return outputs def loss(params, batch): inputs, targets = batch preds = predict(params, inputs) return np.sum((preds - targets) ** 2)

Motivating JAX What’s missing? • Accelerator hardware (GPU/TPU) • Training
via automatic differentiation • Optimized compilation with fusion, memory layout, remat, … • Vectorized batching of operations • Parallelization over multiple accelerators import numpy as np def predict(params, inputs): for W, b in params: outputs = np.dot(inputs, W) + b inputs = np.tanh(outputs) return outputs def loss(params, batch): inputs, targets = batch preds = predict(params, inputs) return np.sum((preds - targets) ** 2)

import jax.numpy as jnp from jax import grad, jit, vmap
def predict(params, inputs): for W, b in params: outputs = jnp.dot(inputs, W) + b inputs = jnp.tanh(outputs) return outputs def loss(params, batch): inputs, targets = batch preds = predict(params, inputs) return jnp.sum((preds - targets) ** 2) gradient_fun = jit(grad(loss)) perexample_grads = jit(vmap(grad(loss), in_axes=(None, 0))) Motivating JAX

An Image is Worth 16x16 Words: Transformers for Image Recognition
at Scale. Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby. arXiv 2021. Vision Transformer

Learned Initializations for Optimizing Coordinate-Based Neural Representations. Tancik, Mildenhall, et
al. CVPR 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. Barron et al. arXiv 2021. Nerfies: Deformable Neural Radiance Fields. Park et al. arXiv 2020. NeRF

JAX, MD: a framework for differentiable physics. Schoenholz and Cubuk.
NeurIPS 2020. Designing self-assembling kinetics with differentiable physics models. Goodrich, King, Schoenholz, Cubuk, and Brenner. PNAS 2021. JAX, MD for molecular dynamics and simulation

Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural
Networks. David Pfau,* James S. Spencer,* Alex G. de G. Matthews and W. M. C. Foulkes Physical Review Research 2(3), 033429, September 2020. FermiNet: Quantum Physics and Chemistry from First Principles. DeepMind Blog, 2020. DeepMind: FermiNet

Podracer architectures for scalable Reinforcement Learning. Hessel, Kroiss, et al.
arXiv 2021. MuZero Sampled and MuZero Unplugged Hubert, Schrittwieser, et al. arXiv 2021. DeepMind: next generation of RL / MuZero work in JAX

AlphaFold: a solution to a 50-year-old grand challenge in biology.
DeepMind Blog, 2020. (post refers to work done in TF) DeepMind: next generation of AlphaFold work in JAX

What about scale?

MLPerf Training v0.7 results (in seconds, lower is better) *
Google, Research category † NVIDIA, Available On-Premise category. MLPerf v0.7 Training, closed division. Retrieved from www.mlperf.org 1 December 2020, entries 0.7-64, 0.7-65, 0.7-67, 0.7-30, 0.7-33, 0.7-37, 0.7-38. MLPerf name and logo are trademarks. See www.mlperf.org for more information. 50000x speedup over 5 years!

What are Cloud TPUs? = 4 TPU v3 chips (8
cores) attached to a CPU host + high-speed interconnects + compiler magic 🦄

What are Cloud TPU Pods? = 1,024 TPU v3 chips
(2,048 cores) attached to many CPU hosts + high-speed interconnects + compiler magic 🦄

A Cloud TPU Pod Slice Host TPU Host Host TPU
TPU TPU Host

A Cloud TPU Pod Slice running JAX Host TPU Host
Host Cloud Storage (datasets, checkpoints, etc.) TPU TPU TPU Host Optional local disk ssh ssh ssh ssh

Demo! https://twitter.com/jekbradbury/status/1337528357517291520

JAX: accelerating ML with composable transformations Matthew Johnson (mattjj@google.com) on
behalf of the JAX team

JAX: Accelerated Machine Learning Research via ...

JAX: Accelerated Machine Learning Research via Composable Function Transformations in Python (Matt Johnson, Google Brain)

Anyscale

More Decks by Anyscale

Other Decks in Technology

Featured

Transcript

JAX: accelerating ML with composable transformations Matthew Johnson (mattjj@google.com) on

How might you implement a deep neural network from scratch

Motivating JAX import numpy as np def predict(params, inputs): for

Motivating JAX import numpy as np def predict(params, inputs): for

Motivating JAX What’s missing? • Accelerator hardware (GPU/TPU) • Training

import jax.numpy as jnp from jax import grad, jit, vmap

Demo!

An Image is Worth 16x16 Words: Transformers for Image Recognition

Learned Initializations for Optimizing Coordinate-Based Neural Representations. Tancik, Mildenhall, et

JAX, MD: a framework for differentiable physics. Schoenholz and Cubuk.

Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural

Podracer architectures for scalable Reinforcement Learning. Hessel, Kroiss, et al.

AlphaFold: a solution to a 50-year-old grand challenge in biology.

What about scale?

MLPerf Training v0.7 results (in seconds, lower is better) *

What are Cloud TPUs? = 4 TPU v3 chips (8

What are Cloud TPU Pods? = 1,024 TPU v3 chips

A Cloud TPU Pod Slice Host TPU Host Host TPU

A Cloud TPU Pod Slice running JAX Host TPU Host

Demo! https://twitter.com/jekbradbury/status/1337528357517291520

JAX: accelerating ML with composable transformations Matthew Johnson (mattjj@google.com) on

:D :}