deep learning - simon fraser university

Deep LearningApala Guha

1

Deep Learning Platforms• Caffe - Berkeley Vision and Learning Center

• AlexNet - U Toronto

• cuda-ConvNet - Fork out of AlexNet

• convNet.js - works in your browser

• CaffeOnSpark - Yahoo Labs

• TensorFlow - Google

• Neuromorphic processor (IBM), DaDianNao

3

Outline• Neural networks (NNs)

• Overview

• Brain analogy

• How NNs work

• Parallel NN computation

• Convolutional Neural Networks (ConvNets)

• Layers

• Assignment

4


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

5

Neural Networks

6


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

8

Biological Neural Network9


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

10

Neural Network Mechanism

11

*

+

x

y

q

f

zf = (x + y)z q = x + y


12

*

+

x

y

q

f

zf = (x + y)z = qz q = x + y

-2

53

-4

-12


13

*

+

x

y

q

f

z

f = (x + y)z = qz q = x + y

gradients (influence): df/dq = z df/dz = q df/dx = df/dq * dq/dx = z df/dy = df/dq * dq/dy = z

-2

53

-4

-121

-4

3

-4

-4

Neural Network Mechanism• Forward propagation calculates the

values at each node

• each node/neuron/unit calculates a linear combination of inputs: w1x1 + w2x2 + … + wkxk

• applies activation function to above result, e.g. sigmoid

14

Neural Network Mechanism• Backward propagation

• error/cost/loss measured at output layer

• gradient of loss wrt each weight is calculated

• gradient and a learning rate is used to update the weights - similar to gradient descent

• regularization is also used

• Forward and backward propagation run in alternate passes for several iterations

15


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

16

Parallelism in NN computation• Embarassingly parallel

• CPUs: cores, vectors

• GPUs

• distributed machines

• Many levels of parallelism

• Parallelly use each training example (similar to stochastic gradient descent)

• Pipeline parallelism in layers

• Unit/neuron parallelism

• Weight multiplication parallelism in neurons

17


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

18

Convolution19

- output is a dot product of kernel and input - w1x1 + w2x2 + … + wkxk - Each kernel looks for some feature

Convolutional Neural Network20

input image 32x32x3

kernel bank (5x5x3)x5

output (28x28x1)x5


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

21

ConvNet Layers• Convolution

• Activation e.g. sigmoid, ReLU etc.

• Pooling: downsampling

• e.g. 2x2 -> 1x1

• take max value in 2x2 matrix

• Usually convolution and activation layers come in pairs interspersed by pooling layers

• The task is to learn the weights in the convolution layers

22

ConvNet Visualization23

Applying ConvNets

• What to do when there is:

• High bias => underfit NN => make NN bigger

• High variance => overfit NN => use more training examples

24


• Overview

• Brain analogy

• How NNs work



• Layers

• Assignment

25

Assignment 7

• Use Caffe in CPU-only mode

• Pre-trained model CaffeNet

• Generate the image classification labels

• Generate layer visualization

26

deep learning - simon fraser university

Documents