exponential expressivity in deep neural networks through transient...
TRANSCRIPT
Exponential expressivity in deep neural networks through transient chaosBen Poole1, Subhaneil Lahiri1, Maithra Raghu2,3, Jascha Sohl-Dickstein3, Surya Ganguli1
1Stanford University, 2Cornell University , 3Google Brain
Goal: develop a theoretical understanding of deep neural networks● Expressivity: represent a large class of functions● Trainability: tractable algorithms for finding good solutions● Generalizability: work well in unseen regions of input space
Introduction
Expressivity in random neural networks
Signal propagation in random neural networks
Length propagation
Correlation propagationLocal stretching, 1 Local curvature Global curvature
(Grassmanian length)
x0(θ) x1(θ) x2(θ) x3(θ)
Input Output
Independent random Normal weights and biases:
Fully-connected neural network with nonlinearity :
weight variance bias variance
A single point: When does its length grow or shrink and how fast?
Theory: how do simple input manifolds propagate through a deep network?
A pair of points: Do they become more similar or more different?
Experiment: random deep nets more expressive than wide shallow nets
A smooth manifold: How does its curvature and volume change?
Self-averaging approximation: for large Nl, average over neurons in a layer ≈ average over random weights for one neuron
Incr
easi
ng σ
W
Chaotic regime: nearby points become decorrelated
Ordered regime: nearby points become more correlated
q* - fixed point of iterative map
The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1:
Recursion relation for the length of a point as it propagates through the network:
Depth
1 acts like a local stretching factor: 1 < 1 : nearby points come closer together
1 > 1 : nearby points are driven apart
Local Stretch Local Curvature Grassmannian Length
Ordered: 1 < 1 Exponential decay Exponential growth Constant
Chaotic: 1 > 1 Exponential growth Constant Exponential growth
ReferencesM. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232Code to reproduce all results: github.com/ganguli-lab/deepchaos
Riemannian geometry of manifold propagation
error
Our work: expressivity in neural networks with random weights:● New framework for analyzing random deep networks using mean
field theory and Riemannian geometry● Random deep nets are exponentially more expressive than shallow:
deep → shallow requires exponentially more neurons!● Can represent exponentially curved decision boundaries in input
Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support.
Autocorrelation
In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature.