Shannon
April 25, 2019
360

# Automatic differentiation in Scala by Xiayun Sun

April 25, 2019

## Transcript

1. ### AUTO DIFFERENTIATION AND DOING IT IN SCALA XIAYUN SUN |

“JOY” | BABYLON HEALTH

3. ### PROBLEM STATEMENT ▸ Differentiate any (differentiable) function, to any order,

automatically ▸ But isn’t this just chain rule?
4. ### DIFFERENTIATE PROGRAMMING FUNCTIONS! ▸ We want sth like this: @autodiff

def f(x): y = 0 for i in 1 to 4: y = sin(x+y) return y ▸ Or, in Scala: @autodiff def f:Double => Double = x => (1 to 4).foldLeft(0d){case (y, _) => sin(x+y)}
5. ### THIS IS HOW IT LOOKS IN PYTORCH ▸ Relax —

this is still a Scala talk
6. ### BUT WHY ▸ Because it’s pretty cool — let machines

do everything! \o/ ▸ Have you heard of “deep learning” / “neural network” / \$other_buzzwords? ▸ Deﬁne f: input => loss ▸ Minimise loss by moving inputs slightly in the direction of gradient of f (“gradient descent”) ▸ Now imagine write arbitrary code for f and not worry about how to differentiate that ▸ PyTorch; TensorFlow

8. ### “DUAL NUMBER” TRICK ▸ Replace each numerical variable by a

pair of (self, derivative) ▸ set `derivative` = 1 for variable to differentiate on, 0 otherwise ▸ These pairs follow a set of algebraic rules: ▸ Apply the function now with the dual number pair and its algebra: ▸ magically: f(x) => f((x, x’)) = (f(x), f’(x)) ▸ Math behind is quite neat, see references in the end https://en.wikipedia.org/wiki/Automatic_differentiation
9. ### DO IT IN SCALA ▸ Operator overloading for dual number

via implicits ▸ Source transformation via Scalameta ▸ Natural higher order functions

11. ### MORE INTERESTING STUFF ▸ What we did is “forward AD”,

there’s also “reverse AD” ▸ Full gradient vector instead of single derivative ▸ Actually useful ▸ Poke AST at compiler level ▸ S4TF ▸ Lambda calculus thingy ▸ “Lambda the Ultimate Backpropagator” ▸ http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf
12. ### LINKS & REFERENCES ▸ Code during live demo: search for

“autodiff.scala” in gist from @xysun ▸ Wikipedia on AD: https://en.wikipedia.org/wiki/Automatic_differentiation ▸ Really nice & easy paper: https://arxiv.org/pdf/1404.7456v1.pdf ▸ Math behind dual numbers (section “forward mode”): https:// alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/ ▸ Lambda paper: http://www-bcl.cs.may.ie/~barak/papers/toplas-reverse.pdf ▸ S4TF AD notes: https://gist.github.com/rxwei/ 30ba75ce092ab3b0dce4bde1fc2c9f1d ▸ Really interesting take on different neural networks and functional programming constructs: http://colah.github.io/posts/2015-09-NN-Types-FP/