neural ordinary differential equationsrtqichen/pdfs/neural_ode_slides.pdf · neural ordinary...
TRANSCRIPT
![Page 1: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/1.jpg)
Neural Ordinary Differential Equations
Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud
University of Toronto
![Page 2: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/2.jpg)
Background: Ordinary Differential Equations (ODEs)
- Model the instantaneous change of a state.
(explicit form)
- Solving an initial value problem (IVP) corresponds to integration.
(solution is a trajectory)
- Euler method approximates with small steps:
![Page 3: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/3.jpg)
Residual Networks interpreted as an ODE Solver- Hidden units look like:
- Final output is the composition:
Haber & Ruthotto (2017). E (2017).
![Page 4: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/4.jpg)
Residual Networks interpreted as an ODE Solver- Hidden units look like:
- Final output is the composition:
- This can be interpreted as an Euler discretization of an ODE.
Haber & Ruthotto (2017). E (2017).
- In the limit of smaller steps:
![Page 5: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/5.jpg)
Deep Learning as Discretized Differential EquationsMany deep learning networks can be interpreted as ODE solvers.
Network Fixed-step Numerical Scheme
ResNet, RevNet, ResNeXt, etc. Forward Euler
PolyNet Approximation to Backward Euler
FractalNet Runge-Kutta
DenseNet Runge-Kutta
Lu et al. (2017) Chang et al. (2018)Zhu et al. (2018)
![Page 6: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/6.jpg)
Deep Learning as Discretized Differential EquationsMany deep learning networks can be interpreted as ODE solvers.
Network Fixed-step Numerical Scheme
ResNet, RevNet, ResNeXt, etc. Forward Euler
PolyNet Approximation to Backward Euler
FractalNet Runge-Kutta
DenseNet Runge-Kutta
Lu et al. (2017) Chang et al. (2018)Zhu et al. (2018)
But:(1) What is the underlying dynamics?(2) Adaptive-step size solvers provide better error handling.
![Page 7: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/7.jpg)
“Neural” Ordinary Differential Equations
Instead of y = F(x),
![Page 8: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/8.jpg)
Parameterize
“Neural” Ordinary Differential Equations
Instead of y = F(x), solve y = z(T) given the initial condition z(0) = x.
![Page 9: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/9.jpg)
Parameterize
“Neural” Ordinary Differential Equations
Solve the dynamic using any black-box ODE solver.
- Adaptive step size.- Error estimate.- O(1) memory learning.
Instead of y = F(x), solve y = z(T) given the initial condition z(0) = x.
![Page 10: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/10.jpg)
Backprop without knowledge of the ODE SolverUltimately want to optimize some loss
![Page 11: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/11.jpg)
Backprop without knowledge of the ODE SolverUltimately want to optimize some loss
Naive approach: Know the solver. Backprop through the solver.- Memory-intensive.- Family of “implicit” solvers perform inner optimization.
![Page 12: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/12.jpg)
Backprop without knowledge of the ODE SolverUltimately want to optimize some loss
Naive approach: Know the solver. Backprop through the solver.- Memory-intensive.- Family of “implicit” solvers perform inner optimization.
Our approach: Adjoint sensitivity analysis. (Reverse-mode Autodiff.)- Pontryagin (1962).
+ Automatic differentiation.+ O(1) memory in backward pass.
![Page 13: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/13.jpg)
Continuous-time Backpropagation
Residual network.
Forward:
Backward:
Params:
Define:Adjoint method.
![Page 14: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/14.jpg)
Continuous-time Backpropagation
Residual network.
Forward:
Backward:
Params:
Adjoint method.
Forward:
Define:
![Page 15: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/15.jpg)
Continuous-time Backpropagation
Residual network.
Forward:
Backward:
Params:
Adjoint method.
Forward:
Backward:
Adjoint DiffEqAdjoint State
Define:
![Page 16: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/16.jpg)
Continuous-time Backpropagation
Residual network.
Forward:
Backward:
Params:
Adjoint method.
Forward:
Backward:
Params:
Adjoint DiffEqAdjoint State
Define:
![Page 17: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/17.jpg)
A Differentiable Primitive for AutoDiff
Forward:
Backward:
![Page 18: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/18.jpg)
A Differentiable Primitive for AutoDiff
Forward:
Backward:
![Page 19: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/19.jpg)
A Differentiable Primitive for AutoDiff
Reversible networks (Gomez et al. 2018) also only require O(1)-memory, but require very specific neural network architectures with partitioned dimensions.
Don’t need to store layer activations for reverse pass - just follow dynamics in reverse!
![Page 20: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/20.jpg)
Reverse versus Forward Cost
- Empirically, reverse pass roughly half as expensive as forward pass.
-
- Adapts to instance difficulty.
-
- Num evaluations can be viewed as number of layers in neural nets.
NFE = Number of Function Evaluations.
![Page 21: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/21.jpg)
Dynamics Become Increasingly Complex
- Dynamics become more demanding to compute during training.
- Adapts computation time according to complexity of diffeq.
In contrast, Chang et al. (ICLR 2018) explicitly add layers during training.
![Page 22: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/22.jpg)
Continuous-time RNNs for Time Series Modeling- We often want arbitrary measurement times, ie. irregular time intervals.- Can do VAE-style inference with a latent ODE.
![Page 23: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/23.jpg)
ODEs vs Recurrent Neural Networks (RNNs)
- RNNs learn very stiff dynamics, have exploding gradients.
-
- Whereas ODEs are guaranteed to be smooth.
![Page 24: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/24.jpg)
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):
- For a Lipschitz continuous function
![Page 25: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/25.jpg)
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):
- For a Lipschitz continuous function
- In other words,
![Page 26: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/26.jpg)
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):
- For a Lipschitz continuous function
- In other words,
With an invertible F:
![Page 27: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/27.jpg)
Continuous Normalizing Flows
1D: 2D: Data Discrete-NF CNF
![Page 28: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/28.jpg)
Is the ODE being correctly solved?
![Page 29: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/29.jpg)
Stochastic Unbiased Log Density
![Page 30: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/30.jpg)
Stochastic Unbiased Log Density
Can further reduce time complexity using stochastic estimators.
Grathwohl et al. (2019)
![Page 31: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/31.jpg)
FFJORD - Stochastic Continuous Flows
Grathwohl et al. (2019)
MNIST - Model Samples CIFAR10 - Model Samples
![Page 32: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/32.jpg)
Variational Autoencoders with FFJORD
![Page 33: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/33.jpg)
ODE Solving as a Modeling PrimitiveAdaptive-step solvers with O(1) memory backprop.
github.com/rtqichen/torchdiffeq
Future directions we’re currently working on:
- Latent Stochastic Differential Equations.- Network architectures suited for ODEs.- Regularization of dynamics to require fewer evaluations.
![Page 34: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/34.jpg)
Thanks!Yulia Rubanova Jesse Bettencourt David Duvenaud
Co-authors:
![Page 35: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/35.jpg)
Extra Slides
![Page 36: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/36.jpg)
Latent Space Visualizations
![Page 37: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/37.jpg)
• Released an implementation of reverse-mode autodiff through black-box ODE solvers.
• Solves a system of size 2D + K + 1.
• In contrast, forward-mode implementation solves a system of size D^2 + KD.
• Tensorflow has Dormand-Prince-Shampine Runge-Kutta 5(4) implemented, but uses naive autodiff for backpropagation.
![Page 38: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/38.jpg)
How much precision is needed?
![Page 39: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/39.jpg)
Explicit Error Control
- More fine-grained control than low-precision floats.
- Cost scales with instance difficulty.
NFE = Number of Function Evaluations.
![Page 40: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/40.jpg)
Computation Depends on Complexity of Dynamics
- Time cost is dominated by evaluation of dynamics f.
NFE = Number of Function Evaluations.
![Page 41: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/41.jpg)
Why not use an ODE solver as modeling primitive?- Solving an ODE is expensive.
![Page 42: Neural Ordinary Differential Equationsrtqichen/pdfs/neural_ode_slides.pdf · Neural Ordinary Differential Equations Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud](https://reader033.vdocument.in/reader033/viewer/2022052408/5f0a9c1b7e708231d42c79e5/html5/thumbnails/42.jpg)
Future Directions- Stochastic differential equations and Random ODEs. Approximates stochastic
gradient descent.- Scaling up ODE solvers with machine learning.- Partial differential equations.- Graphics, physics, simulations.