Continuous Depth models and a brief overview of Neural ODE

Foundation of Countinuous Depth ODE, Interpretation & Applications Foundation of
continuous-depth models .. and a brief introduction to Neural ODE Ayan Das PhD Student SketchX, CVSSP SketchX meetup talk September 2, 2021 Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 1 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Presentation Overview
1 Foundation of Countinuous Depth 2 ODE, Interpretation & Applications Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 1 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Foundation of
Countinuous Depth 1 Foundation of Countinuous Depth Motivation Revisiting ResNet Modifying ResNet 2 ODE, Interpretation & Applications Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 1 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Motivation Deep
architectures so far Deep models so far 1 Notion of depth is discrete Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 2 / 15

architectures so far Deep models so far 1 Notion of depth is discrete 2 Parameter count increases with number of layers Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 2 / 15

architectures so far Deep models so far 1 Notion of depth is discrete 2 Parameter count increases with number of layers 3 Backpropagation takes O(L) memory Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 2 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Motivation Design
Motivation Can we have ? 1 Continuous notion of Depth Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 3 / 15

Motivation Can we have ? 1 Continuous notion of Depth 2 Input “evolves” along depth Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 3 / 15

Motivation Can we have ? 1 Continuous notion of Depth 2 Input “evolves” along depth 3 Low cost Backpropagation Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 3 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Revisiting ResNet
Generic ResNet ResNet motivated the development of Neural ODEs ResNet Block (Isolated) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 4 / 15

Generic ResNet ResNet motivated the development of Neural ODEs ResNet Block (Isolated) For each layer l xl+1 ← xl + g(xl) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 4 / 15

Generic ResNet ResNet motivated the development of Neural ODEs ResNet Block (Isolated) For each layer l xl+1 ← xl + g(xl) xl ∈ RM×N×··· xl+1 ∈ RM×N×··· Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 4 / 15

Generic ResNet ResNet motivated the development of Neural ODEs ResNet Block (Isolated) For each layer l xl+1 ← xl + g(xl) xl ∈ RM×N×··· xl+1 ∈ RM×N×··· ResNet architecture (no downstream task) i.e., several blocks cascaded xl+1 ← xl + g(xl ) xl+2 ← xl+1 + g(xl+1 ) xl+3 ← xl+2 + g(xl+2 ) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 4 / 15

Generic ResNet ResNet motivated the development of Neural ODEs ResNet Block (Isolated) For each layer l xl+1 ← xl + g(xl) xl ∈ RM×N×··· xl+1 ∈ RM×N×··· ResNet architecture (no downstream task) i.e., several blocks cascaded xl+1 ← xl + g(xl; Θ1) xl+2 ← xl+1 + g(xl+1; Θ2) xl+3 ← xl+2 + g(xl+2; Θ3) Remeber ! They have different parameters, where majority of the ResNet’s modelling capacity lies Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 4 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Modifying ResNet
Exploiting ResNet structure .. for countinuous depth models Precisely, three changes .. x1 ← x0 + g(x0; Θ1) x2 ← x1 + g(x1; Θ2) x3 ← x2 + g(x2; Θ3) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 5 / 15

Exploiting ResNet structure .. for countinuous depth models Precisely, three changes .. 1 Scale g(·) with a scalar ∆l → Quite harmless, g(·) can always adjust by learning to scale it up x1 ← x0 + g(x0; Θ1)·∆l x2 ← x1 + g(x1; Θ2)·∆l x3 ← x2 + g(x2; Θ3)·∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 5 / 15

Exploiting ResNet structure .. for countinuous depth models Precisely, three changes .. 1 Scale g(·) with a scalar ∆l → Quite harmless, g(·) can always adjust by learning to scale it up 2 Share parameters accross blocks → Reduces number of parameters, hence modelling capacity x1 ← x0 + g(x0; Θ)·∆l x2 ← x1 + g(x1; Θ)·∆l x3 ← x2 + g(x2; Θ)·∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 5 / 15

Exploiting ResNet structure .. for countinuous depth models Precisely, three changes .. 1 Scale g(·) with a scalar ∆l → Quite harmless, g(·) can always adjust by learning to scale it up 2 Share parameters accross blocks → Reduces number of parameters, hence modelling capacity 3 much more blocks than usual → To compensate model capacity, we want more blocks x1 ← x0 + g(x0; Θ)·∆l x2 ← x1 + g(x1; Θ)·∆l x3 ← x2 + g(x2; Θ)·∆l . . . x4 ← x3 + g(x3; Θ) · ∆l x5 ← x4 + g(x4; Θ) · ∆l x6 ← x5 + g(x5; Θ) · ∆l . . . xL ← xL−1 + g(xL−1; Θ) · ∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 5 / 15

Exploiting ResNet structure (Continued) With input x0, we want to compute output at Lth block: xL ← x0 + g(x0; Θ) · ∆l + g(x1; Θ) · ∆l + · · · + g(xL−1; Θ) · ∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 6 / 15

Exploiting ResNet structure (Continued) With input x0, we want to compute output at Lth block: xL ← x0 + L−1 l=0 g(xl; Θ) · ∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 6 / 15

Exploiting ResNet structure (Continued) With input x0, we want to compute output at Lth block: xL ← x0 + L−1 l=0 g(xl; Θ) · ∆l Can we have infinitely long ResNet ? i.e., L → ∞ Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 6 / 15

Exploiting ResNet structure (Continued) With input x0, we want to compute output at Lth block: xL ← x0 + L−1 l=0 g(xl; Θ) · ∆l Can we have infinitely long ResNet ? i.e., L → ∞ The summation might blow up (in either direction) −∞ ← xL → +∞ Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 6 / 15

Exploiting ResNet structure (Continued) With input x0, we want to compute output at Lth block: xL ← x0 + L−1 l=0 g(xl; Θ) · ∆l Can we have infinitely long ResNet ? i.e., L → ∞ The summation might blow up (in either direction) −∞ ← xL → +∞ Solution Simulteneously do the following: L → ∞ ∆l → 0 Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 6 / 15

Emergence of Continuous Depth In limiting case: xL = x0+ lim L→∞ ∆l→0 L−1 l=0 g(xl; Θ) · ∆l Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 7 / 15

Emergence of Continuous Depth In limiting case: xL = x0+ lim L→∞ ∆l→0 L−1 l=0 g(xl; Θ) · ∆l xL = x0+ L 0 g(xl; Θ)dl Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 7 / 15

Continuous Depth ResNet A new kind of ResNet x → x + L 0 g(x; Θ)dl → y F : x → y Parameterized by two things 1 Internal function g, along with Θ 2 Integration limits, i.e. l2 l1 Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 8 / 15

Continuous Depth ResNet A new kind of ResNet x → x + L 0 g(x; Θ)dl → y F : x → y Parameterized by two things 1 Internal function g, along with Θ 2 Integration limits, i.e. l2 l1 Surprised ? L can be fraction. So, L = 1.234 layer network (?) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 8 / 15

Continuous Depth RNN Intermediate values matter x + l1 or l2 or l3 0 g(x; Θ)dl Unlike RNN, we can produce intermediate values at any (non-uniform) interval Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 9 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications ODE, Interpretation
& Applications 1 Foundation of Countinuous Depth 2 ODE, Interpretation & Applications Neural Ordinary Differential Equation Interpretation Potential applications Practical Usage Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 9 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Neural Ordinary
Differential Equation Where is the Differential Equation? Neural Ordinary Differential Equation1 y = x + L 0 g(x; Θ)dl 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 10 / 15

Differential Equation Where is the Differential Equation? Neural Ordinary Differential Equation1 y = F(l = 0) + L 0 g(x; Θ)dl 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 10 / 15

Differential Equation Where is the Differential Equation? Neural Ordinary Differential Equation1 F(l = L) = F(l = 0) + L 0 g(x; Θ)dl 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 10 / 15

Differential Equation Where is the Differential Equation? Neural Ordinary Differential Equation1 F(l = L) = F(l = 0) + L 0 ∂F ∂l dl A popular ODE solver called Eular Integration 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 10 / 15

Differential Equation Where is the Differential Equation? Neural Ordinary Differential Equation1 F(l = L) = F(l = 0) + L 0 ∂F ∂l dl A popular ODE solver called Eular Integration Proposed first in NeurIPS 2018 (best) paper titled ”Neural ODE”1. Parameterize the derivative as a Neural Network: ∂F ∂l = NeuralNet(x; Θ) 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 10 / 15

Differential Equation Backpropagation Naive Backpropagation How to we backpropagate ? ∂(loss) ∂x , ∂(loss) ∂Θ ← x + L 0 ∂F ∂l dl ← ∂(loss) ∂y Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 11 / 15

Differential Equation Backpropagation Naive Backpropagation How to we backpropagate ? ∂(loss) ∂x , ∂(loss) ∂Θ ← x + L 0 ∂F ∂l dl ← ∂(loss) ∂y Why not just discretize and backpropagate trivially ? Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 11 / 15

Differential Equation Backpropagation Naive Backpropagation How to we backpropagate ? ∂(loss) ∂x , ∂(loss) ∂Θ ← x + L 0 ∂F ∂l dl ← ∂(loss) ∂y Why not just discretize and backpropagate trivially ? Nope, don’t do it! 1 Introduces discretization error 2 Memory consumption blows up Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 11 / 15

Differential Equation Backpropagation Adjoint Backpropagation12 x → x + L 0 ∂F ∂l dl → y ∂(loss) ∂x , ∂(loss) ∂Θ ← Adjoint Backprop. ← ∂(loss) ∂y 1“Neural Ordinary Differential Equation” by Ricky T. Q. Chen et al. (https://arxiv.org/pdf/1806.07366.pdf) 2https://ayandas.me/blog-tut/2020/03/20/neural-ode.html Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 12 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Interpretation Standard
Interpretation .. from calculus Initial Value Problem 1 ∂F ∂l = NeuralNet(x; Θ) is a vector field Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 13 / 15

Interpretation .. from calculus Initial Value Problem 1 ∂F ∂l = NeuralNet(x; Θ) is a vector field 2 x → y is its solution trajectory by solving the IVP Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 13 / 15

Interpretation .. from calculus Initial Value Problem 1 ∂F ∂l = NeuralNet(x; Θ) is a vector field 2 x → y is its solution trajectory by solving the IVP Fitting By changing Θ, we can fit a given x0 & Y Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 13 / 15

Interpretation .. from calculus Initial Value Problem 1 ∂F ∂l = NeuralNet(x; Θ) is a vector field 2 x → y is its solution trajectory by solving the IVP Fitting By changing Θ, we can fit a given x0 & Y Supervise intermediate values to fit a whole trajectory Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 13 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Potential applications
Lifespan synthesis1 A possible appraoch 1https://grail.cs.washington.edu/projects/lifespan age transformation synthesis/ Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 14 / 15

Foundation of Countinuous Depth ODE, Interpretation & Applications Practical Usage
Practical Implementation (torchdiffeq1) .. in PyTorch code # libraries like ’torchdiffeq’ from torchdiffeq import odeint_adjoint # The dF/dl = g(x; theta) class Dynamics(nn.Module): def forward(self, x): return ... # forward pass of the ODE layer y = odeint_adjoint(Dynamics(), x0, [0., 1.]) 1https://github.com/rtqichen/torchdiffeq Exploring the notion of continuous-depth & a brief overview of Neural ODE Copyright © 2021 Ayan Das (@dasayan05) 15 / 15

End of Presentation Thank You ayandas.me dasayan05 dasayan05

Continuous Depth models and a brief overview of...

Continuous Depth models and a brief overview of Neural ODE

More Decks by Ayan Das

Other Decks in Research

Featured

Transcript