exponential expressivity in deep neural networks through transient...

Exponential expressivity in deep neural networks through transient chaos Ben Poole 1 , Subhaneil Lahiri 1 , Maithra Raghu 2,3 , Jascha Sohl-Dickstein 3 , Surya Ganguli 1 1 Stanford University, 2 Cornell University , 3 Google Brain Goal: develop a theoretical understanding of deep neural networks ● Expressivity: represent a large class of functions ● Trainability: tractable algorithms for finding good solutions ● Generalizability: work well in unseen regions of input space Introduction Expressivity in random neural networks Signal propagation in random neural networks Length propagation Correlation propagation Local stretching, ᶩ 1 Local curvature Global curvature (Grassmanian length) x 0 ( θ) x 1 ( θ) x 2 ( θ) x 3 ( θ) Input Output Independent random Normal weights and biases: Fully-connected neural network with nonlinearity : weight variance bias variance A single point: When does its length grow or shrink and how fast? Theory: how do simple input manifolds propagate through a deep network? A pair of points: Do they become more similar or more different? Experiment: random deep nets more expressive than wide shallow nets A smooth manifold: How does its curvature and volume change? Self-averaging approximation: for large N l , average over neurons in a layer ≈ average over random weights for one neuron Increasing σ W Chaotic regime: nearby points become decorrelated Ordered regime: nearby points become more correlated q* - fixed point of iterative map The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1: Recursion relation for the length of a point as it propagates through the network: Depth ᶩ 1 acts like a local stretching factor: ᶩ 1 < 1 : nearby points come closer together ᶩ 1 > 1 : nearby points are driven apart Local Stretch Local Curvature Grassmannian Length Ordered: ᶩ 1 < 1 Exponential decay Exponential growth Constant Chaotic: ᶩ 1 > 1 Exponential growth Constant Exponential growth References M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336 S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232 Code to reproduce all results: github.com/ganguli-lab/deepchaos Riemannian geometry of manifold propagation error Our work: expressivity in neural networks with random weights: ● New framework for analyzing random deep networks using mean field theory and Riemannian geometry ● Random deep nets are exponentially more expressive than shallow: deep → shallow requires exponentially more neurons! ● Can represent exponentially curved decision boundaries in input Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support. Autocorrelation In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature.

Upload: others

Post on 15-Jul-2020

10 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Exponential expressivity in deep neural networks through transient chaospoole/deepchaos_poster.pdf · 2016-12-10 · Exponential expressivity in deep neural networks through transient

Exponential expressivity in deep neural networks through transient chaosBen Poole1, Subhaneil Lahiri1, Maithra Raghu2,3, Jascha Sohl-Dickstein3, Surya Ganguli1

1Stanford University, 2Cornell University , 3Google Brain

Goal: develop a theoretical understanding of deep neural networks● Expressivity: represent a large class of functions● Trainability: tractable algorithms for finding good solutions● Generalizability: work well in unseen regions of input space

Introduction

Expressivity in random neural networks

Signal propagation in random neural networks

Length propagation

Correlation propagationLocal stretching, 1 Local curvature Global curvature

(Grassmanian length)

x0(θ) x1(θ) x2(θ) x3(θ)

Input Output

Independent random Normal weights and biases:

Fully-connected neural network with nonlinearity :

weight variance bias variance

A single point: When does its length grow or shrink and how fast?

Theory: how do simple input manifolds propagate through a deep network?

A pair of points: Do they become more similar or more different?

Experiment: random deep nets more expressive than wide shallow nets

A smooth manifold: How does its curvature and volume change?

Self-averaging approximation: for large Nl, average over neurons in a layer ≈ average over random weights for one neuron

Incr

easi

ng σ

Chaotic regime: nearby points become decorrelated

Ordered regime: nearby points become more correlated

q* - fixed point of iterative map

The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1:

Recursion relation for the length of a point as it propagates through the network:

Depth

1 acts like a local stretching factor: 1 < 1 : nearby points come closer together

1 > 1 : nearby points are driven apart

Local Stretch Local Curvature Grassmannian Length

Ordered: 1 < 1 Exponential decay Exponential growth Constant

Chaotic: 1 > 1 Exponential growth Constant Exponential growth

ReferencesM. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232Code to reproduce all results: github.com/ganguli-lab/deepchaos

Riemannian geometry of manifold propagation

error

Our work: expressivity in neural networks with random weights:● New framework for analyzing random deep networks using mean

field theory and Riemannian geometry● Random deep nets are exponentially more expressive than shallow:

deep → shallow requires exponentially more neurons!● Can represent exponentially curved decision boundaries in input

Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support.

Autocorrelation

In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature.

Unleashing Expressivity

Designing Calibration and Expressivity-Efficient

Expressivity in Debussy’s Sonata for Cello and piano ... · Expressivity in Debussy’s Sonata for Cello and piano Towards an annotation system of cello expressiveness Summary:

Theatrical Expressivity of Berio’s Sequenza for Viola ...musicstudies.org/wp-content/uploads/2017/01/Bogunovic_JIMS... · Theatrical expressivity of Berio’s Sequenza 57 & Hargreaves,

Expressivity and musical performance: Practice … · Expressivity and musical performance: Practice strategies for pianists - Alfonso Benetti Jr. 2 PERFORMANCE STUDIES NETWORK INTERNATIONAL

Expressivity of Parameterized and Data-driven

MCB140 09-17-07 1 Penetrance and expressivity “The terms penetrance and expressivity quantify the modification of the influence on phenotype of a particular

Recognition, Analysis and Synthesis of Gesture Expressivity

Exemplaric Expressivity of Modal Logics

Penetrance and expressivity

Modeling Expressivity in ECAs

Exponential Smoothing Methods.ppt - personal.cb.cityu.edu.hkpersonal.cb.cityu.edu.hk/msawan/teaching/ms6215/Exponential Smoothing... · Slide 4 Exponential Smoothing • Exponential