Intro to Neural Networks
Prof. Kate Saenko, [email protected]
Today
• Neural Networks
– aka Multilayer Perceptrons,
– aka Deep Belief Networks/Deep Learning
• Unsupervised Neural Networks
– Autoencoders
– Sparse Autoencoders
Intro to Neural Networks
Slides by Andrew Ng
Andrew Ng
Neural Networks
Origins: Algorithms that try to mimic the brain. Was very widely used in 80s and early 90s; popularity diminished in late 90s. Recent resurgence: State-of-the-art technique for many applications
Andrew Ng
[Roe et al., 1992]
Auditory cortex learns to see
Auditory Cortex
The “one learning algorithm” hypothesis
[Roe et al., 1992]
Andrew Ng
Somatosensory cortex learns to see
Somatosensory Cortex
The “one learning algorithm” hypothesis
[Metin & Frost, 1989]
Andrew Ng
Seeing with your tongue Human echolocation (sonar)
Haptic belt: Direction sense Implanting a 3rd eye
Sensor representations in the brain
[BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]
Andrew Ng
Neuron in the brain
Andrew Ng
Neurons in the brain
[Credit: US National Institutes of Health, National Institute on Aging]
Andrew Ng
Neuron model: Logistic unit
Sigmoid (logistic) activation function.
Andrew Ng
Sigmoid function Logistic function
One Unit is a Logistic Regression Classifier
Want
1
0.5
0
1
0
Andrew Ng
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x
Tell patient that 70% chance of tumor being malignant
Example: If
“probability that y = 1, given x, parameterized by ”
Andrew Ng
Neural Network
Layer 3 Layer 1 Layer 2
Andrew Ng
Neural Network “activation” of unit in layer
matrix of weights controlling function mapping from layer to layer
If network has units in layer , units in layer , then will be of dimension .
Andrew Ng
Add .
Forward propagation: Vectorized implementation
Andrew Ng
Layer 3 Layer 1 Layer 2
Neural Network learning its own features
Andrew Ng
Layer 3 Layer 1 Layer 2
Other network architectures
Layer 4
Andrew Ng
Handwritten digit classification
[Courtesy of Yann LeCun]
Andrew Ng
Handwritten digit classification
[Courtesy of Yann LeCun]
Andrew Ng
Neural Network (Classification)
Binary classification 1 output unit
Layer 1 Layer 2 Layer 3 Layer 4
Multi-class classification (K classes)
K output units
total no. of layers in network
no. of units (not counting bias unit) in layer
pedestrian car motorcycle truck
E.g. , , ,
Andrew Ng
Cost function
Neural network:
regularization
training error
Andrew Ng
Training set:
How to choose parameters ?
m examples
Cost function: Training Error
Andrew Ng
Linear regression:
“non-convex” “convex”
Cost function: Training Error
Andrew Ng
If y = 1
1 0
Cost function: Training Error
Andrew Ng
If y = 0
1 0
Cost function: Training Error
Andrew Ng
Cost function: Training Error
Andrew Ng
Output
To fit parameters :
To make a prediction given new :
Cost function: Training Error
Andrew Ng
Cost function
Neural network:
regularization
training error
Gradient computation
Need code to compute: -
-
Use “Backpropagation algorithm” - Efficient way to compute
- Computes gradient
incrementally by “propagating” backwards through the network
Unsupervised Neural Networks
Unsupervised Learning
• So far, we talked about classifiation
• Can also apply algorithm to regression, or
• unsupervised learning, by setting outputs to inputs – Autoencoder
– Sparse Autoencoder
– Restricted Bolzmann Machine (RBM)
autoencoder
Autoencoder
• The identity function seems trivial, but…
• …by putting constraints on the hidden units, we can discover interesting structure – Use few units, enforce sparsity, etc.
• Example: – Learn autoencoder on 10x10 pixel patches
– Use only 50 hidden nodes
– Forces hidden nodes to “compress” data
• Similar to PCA, sparse coding, LLE, etc.
Sparse Autoencoder
• Think of a neuron as being "active" (or as "firing") if its output value is close to 1, or as being "inactive" if its output value is close to 0.
• We would like to constrain the neurons to be inactive most of the time
• Same idea as Sparse Coding! But different method
Sparse Autoencoder
• we will write to denote the activation of this hidden unit when the network is given a specific input x
• the average activation of hidden unit (averaged over the training set) is
• We would like to (approximately) enforce the constraint
where is a sparsity parameter, typically a small value close to zero (say ).
RECALL: Gradient computation
Need code to compute: -
-
Use “Backpropagation algorithm” - Efficient way to compute
- Computes gradient
incrementally by “propagating” backwards through the network
+ Sparsity penalty
regularization
training error
+
Sparse Autoencoder
http://deeplearning.stanford.edu/wiki/index.php/Visualizing_a_Trained_Autoencoder
Sparse Autoencoder • Visualizing a sparse Autoencoder trained on 10x10
image patches