13 neural networks - virginia techcourses.cs.vt.edu/cs4804/fall18/slide_pdfs/13 neural...

33
Neural Networks Intro to AI Bert Huang Virginia Tech

Upload: others

Post on 16-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Neural Networks

Intro to AI Bert Huang

Virginia Tech

Page 2: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: back propagation

Page 3: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg

Page 4: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/Neuron#/media/File:GFPneuron.png

Page 5: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Page 6: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Page 7: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Logistic Function�(x) =

1

1 + exp(�x)

Page 8: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

limx!�1

�(x) = limx!�1

1

1 + exp(�x)= 0.0

Logistic Function�(x) =

1

1 + exp(�x)

�(0) =1

1 + exp(�0)=

1

1 + 1= 0.5

limx!1

�(x) = limx!1

1

1 + exp(�x)=

1

1= 1.0

Page 9: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

From Features to Probability

f (x) :=1

1 + exp(�w>x)

Page 10: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Parameterizing p(y|x)p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

x1 x2 x3 x4 x5

w1w2 w3w4 w5

y

Page 11: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Multi-Layered Perceptronx1 x2 x3 x4 x5

w1w2 w3w4 w5

h1

Page 12: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Multi-Layered Perceptronx1 x2 x3 x4 x5

h1 h2

y

raw data

representation

prediction

pixel values

shapes

faces

roundshadows

Page 13: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Multi-Layered Perceptron

x1 x2 x3 x4 x5

h1 h2

yh = [h1, h2]>

h1 = �(w>11x) h2 = �(w>

12x)

p(y |x) = �(w>21h)

p(y |x) = �⇣w>21

⇥�(w>

11x),�(w>12x)

⇤>⌘

Page 14: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Decision Surface: Logistic Regression

Page 15: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Decision Surface: 2-Layer, 2 Hidden Units

Page 16: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Decision Surface: 2-Layer, More Hidden Units

3 hidden units 10 hidden units

Page 17: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Decision Surface: More Layers, More Hidden Units

4 layers, 10 hidden units each layer10 layers, 5 hidden units per layer

Page 18: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

= L(W)

Training Neural Networks

minW

1n

n

∑i=1

l( f(xi, W), yi)

Wj ← Wj − α ( ∂L∂Wj )

average training error

Gradient Descent

Page 19: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Gradient Descent

Wj ← Wj − α ( ∂L∂Wj ) W1

W2

W3

L(W3)

L(W2)

L(W1) very positivetake big step left

almost zero take tiny step left

slightly negative take medium step right

Page 20: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Approximate Q-LearningQ̂(s, a) := g(s, a,✓) := ✓1f1(s, a) + ✓2f2(s, a) + . . .+ ✓dfd(s, a)

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘ @g

@✓i

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘fi(s, a)

Page 21: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Back Propagation• Back propagation:

• Compute hidden unit activations: forward propagation

• Compute gradient at output layer: error

• Propagate error back one layer at a time

• Chain rule via dynamic programming

Page 22: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the
Page 23: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Chain Rule Review

f(g(x))

d f(g(x))d x

=d f(g(x))d g(x)

⋅d g(x)

d x

g(x)x f(g(x))

Page 24: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Chain Rule on More Complex Functionh( f(a) + g(b))

d h( f(a) + g(b))d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d f(a)d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d g(b)

d b

d h( f(a) + g(b))d b

Page 25: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

Page 26: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

L(y)d Ld wn

=d Ld y

⋅d y

d wn

Page 27: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Intuition of Back Propagation• At each layer, calculate how much changing the input changes the

final output (derivative of final output w.r.t. layer’s input)

• Calculate directly for last layer

• For preceding layers, use calculation from next layer and work backwards through network

• Use that derivative to find how changing the weights affect the error of the final output

Page 28: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

FYI: Matrix Formx

h1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

f(x, W) = s(Wm hm-1)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

(You will not be tested on this matrix form in

this course.)

Page 29: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

FYI: Matrix Gradient Recipeh1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

f(x, W) = s(Wm hm-1)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W ))rWmJ = �mh>

m�1

rWm�1J = �m�1h>m�2

rWiJ = �ih>i�1

rW1J = �1x>

�m�1 = (W>m�m)� s0(Wm�1hm�2)

�i = (W>i+1�i+1)� s0(Wihi�1)

(You will not be tested on this matrix form in

this course.)

Page 30: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

FYI: Matrix Gradient Recipe

h1 = s(W1x)

f(x, W) = s(Wm hm-1)

hi = s(Wi hi-1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W )) rWiJ = �ih>i�1

rW1J = �1x>

Feed Forward Propagation Back Propagation

�i = (W>i+1�i+1)� s0(Wihi�1)

(You will not be tested on this matrix form in

this course.)

Page 31: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Other New Aspects of Deep Learning

• GPU computation

• Differentiable programming

• Automatic differentiation

• Neural network structures

Page 32: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Types of Neural Network Structures

• Feed-forward

• Recurrent neural networks (RNNs)

• Good for analyzing sequences (text, time series)

• Convolutional neural networks (convnets, CNNs)

• Good for analyzing spatial data (images, videos)

Page 33: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: backpropagation