Slide 1

Slide 1 text

AUTO DIFFERENTIATION AND DOING IT IN SCALA XIAYUN SUN | “JOY” | BABYLON HEALTH

Slide 2

Slide 2 text

THIS IS A USELESS TALK - BUT IT’S KINDA FUN

Slide 3

Slide 3 text

PROBLEM STATEMENT ▸ Differentiate any (differentiable) function, to any order, automatically ▸ But isn’t this just chain rule?

Slide 4

Slide 4 text

DIFFERENTIATE PROGRAMMING FUNCTIONS! ▸ We want sth like this: @autodiff def f(x): y = 0 for i in 1 to 4: y = sin(x+y) return y ▸ Or, in Scala: @autodiff def f:Double => Double = x => (1 to 4).foldLeft(0d){case (y, _) => sin(x+y)}

Slide 5

Slide 5 text

THIS IS HOW IT LOOKS IN PYTORCH ▸ Relax — this is still a Scala talk

Slide 6

Slide 6 text

BUT WHY ▸ Because it’s pretty cool — let machines do everything! \o/ ▸ Have you heard of “deep learning” / “neural network” / $other_buzzwords? ▸ Define f: input => loss ▸ Minimise loss by moving inputs slightly in the direction of gradient of f (“gradient descent”) ▸ Now imagine write arbitrary code for f and not worry about how to differentiate that ▸ PyTorch; TensorFlow

Slide 7

Slide 7 text

AVAILABLE APPROACHES ▸ Manual / Symbolic / Numerical ▸ Auto diff (“AD”) https://arxiv.org/pdf/1404.7456v1.pdf

Slide 8

Slide 8 text

“DUAL NUMBER” TRICK ▸ Replace each numerical variable by a pair of (self, derivative) ▸ set `derivative` = 1 for variable to differentiate on, 0 otherwise ▸ These pairs follow a set of algebraic rules: ▸ Apply the function now with the dual number pair and its algebra: ▸ magically: f(x) => f((x, x’)) = (f(x), f’(x)) ▸ Math behind is quite neat, see references in the end https://en.wikipedia.org/wiki/Automatic_differentiation

Slide 9

Slide 9 text

DO IT IN SCALA ▸ Operator overloading for dual number via implicits ▸ Source transformation via Scalameta ▸ Natural higher order functions

Slide 10

Slide 10 text

⚠ LIVE CODING ⚠

Slide 11

Slide 11 text

MORE INTERESTING STUFF ▸ What we did is “forward AD”, there’s also “reverse AD” ▸ Full gradient vector instead of single derivative ▸ Actually useful ▸ Poke AST at compiler level ▸ S4TF ▸ Lambda calculus thingy ▸ “Lambda the Ultimate Backpropagator” ▸ http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf

Slide 12

Slide 12 text

LINKS & REFERENCES ▸ Code during live demo: search for “autodiff.scala” in gist from @xysun ▸ Wikipedia on AD: https://en.wikipedia.org/wiki/Automatic_differentiation ▸ Really nice & easy paper: https://arxiv.org/pdf/1404.7456v1.pdf ▸ Math behind dual numbers (section “forward mode”): https:// alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/ ▸ Lambda paper: http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf ▸ S4TF AD notes: https://gist.github.com/rxwei/ 30ba75ce092ab3b0dce4bde1fc2c9f1d ▸ Really interesting take on different neural networks and functional programming constructs: http://colah.github.io/posts/2015-09-NN-Types-FP/