vbm683 machine learning
TRANSCRIPT
![Page 1: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/1.jpg)
VBM683
Machine Learning
Pinar Duygulu
Slides are adapted from
Dhruv Batra (Virginia Tech),
J. Elder
![Page 2: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/2.jpg)
New Topic: Neural Networks
(C) Dhruv Batra 2
![Page 3: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/3.jpg)
Two class discriminant function
![Page 4: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/4.jpg)
Two class discriminant function
![Page 5: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/5.jpg)
Perceptron
![Page 6: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/6.jpg)
Synonyms
• Neural Networks
• Artificial Neural Network (ANN)
• Feed-forward Networks
• Multilayer Perceptrons (MLP)
• Types of ANN
– Convolutional Nets
– Autoencoders
– Recurrent Neural Nets
• [Back with a new name]: Deep Nets / Deep Learning
(C) Dhruv Batra 6
![Page 7: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/7.jpg)
Biological Neuron
(C) Dhruv Batra 7
![Page 8: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/8.jpg)
The Neuron Metaphor
• Neurons
– accept information from multiple inputs,
– transmit information to other neurons.
• Multiply inputs by weights along edges
• Apply some function to the set of inputs at each node
8Slide Credit: HKUST
![Page 9: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/9.jpg)
Types of Neurons
Linear Neuron
Logistic Neuron
Perceptron
Potentially more. Require a convex
loss function for gradient descent training.
9Slide Credit: HKUST
![Page 10: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/10.jpg)
Generalized linear models
![Page 11: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/11.jpg)
Sigmoid
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
w0=0, w1=1w0=2, w1=1 w0=0, w1=0.5
Slide Credit: Carlos Guestrin(C) Dhruv Batra 11
![Page 12: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/12.jpg)
Case 1: Linearly separable inputs
![Page 13: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/13.jpg)
The Perceptron algorithm
![Page 14: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/14.jpg)
The Perceptron algorithm
![Page 15: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/15.jpg)
The Perceptron criterion
![Page 16: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/16.jpg)
The Perceptron criterion
![Page 17: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/17.jpg)
The Perceptron algorithm
![Page 18: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/18.jpg)
The Perceptron algorithm
![Page 19: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/19.jpg)
The Perceptron algorithm
![Page 20: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/20.jpg)
Not monotonic
![Page 21: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/21.jpg)
The perceptron convergence
theorem
![Page 22: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/22.jpg)
Example
![Page 23: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/23.jpg)
The first learning machine
![Page 24: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/24.jpg)
Practical limitations
![Page 25: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/25.jpg)
Generalization to not linearly
separable inputs
![Page 26: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/26.jpg)
Generalization to multiclass
problems
![Page 27: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/27.jpg)
K>2 classes
![Page 28: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/28.jpg)
K>2 classes
![Page 29: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/29.jpg)
K>2 classes
![Page 30: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/30.jpg)
1-of-K coding scheme
![Page 31: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/31.jpg)
Computational limitations of
perceptrons
![Page 32: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/32.jpg)
Limitation
• A single “neuron” is still a linear decision boundary
• What to do?
• Idea: Stack a bunch of them together!
(C) Dhruv Batra 32
![Page 33: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/33.jpg)
Multilayer Networks
• Cascade Neurons together
• The output from one layer is the input to the next
• Each Layer has its own sets of weights
33Slide Credit: HKUST
![Page 34: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/34.jpg)
Multi-layer perceptrons
![Page 35: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/35.jpg)
Universal Function Approximators
• Theorem
– 3-layer network with linear outputs can uniformly
approximate any continuous function to arbitrary accuracy,
given enough hidden units [Funahashi ’89]
(C) Dhruv Batra 35
![Page 36: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/36.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
36Slide Credit: HKUST
![Page 37: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/37.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
37Slide Credit: HKUST
![Page 38: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/38.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
38Slide Credit: HKUST
![Page 39: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/39.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
39Slide Credit: HKUST
![Page 40: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/40.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
40Slide Credit: HKUST
![Page 41: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/41.jpg)
Feed-Forward Networks
• Predictions are fed forward through the network to
classify
41Slide Credit: HKUST
![Page 42: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/42.jpg)
Implementing logical relations
![Page 43: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/43.jpg)
XOR problem
![Page 44: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/44.jpg)
Combining two linear classifiers
![Page 45: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/45.jpg)
Combining two linear classifiers
![Page 46: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/46.jpg)
Combining two linear classifiers
![Page 47: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/47.jpg)
Two-layer perceptron
![Page 48: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/48.jpg)
Two-layer perceptron
![Page 49: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/49.jpg)
Two-layer perceptron
![Page 50: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/50.jpg)
Two-layer perceptron
![Page 51: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/51.jpg)
Higher dimensions
![Page 52: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/52.jpg)
Higher dimensions
![Page 53: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/53.jpg)
Two-layer perceptron
![Page 54: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/54.jpg)
Two layer perceptron
![Page 55: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/55.jpg)
Limitations of a two-layer
perceptron
![Page 56: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/56.jpg)
Three layer perceptron
![Page 57: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/57.jpg)
Three layer perceptron
![Page 58: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/58.jpg)
Three layer perceptron
![Page 59: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/59.jpg)
Learning parameters – Training
data
![Page 60: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/60.jpg)
Choosing an activation function
![Page 61: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/61.jpg)
Smooth activation function
![Page 62: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/62.jpg)
Output : Two classes
![Page 63: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/63.jpg)
Output: K> 2 classes
![Page 64: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/64.jpg)
Non-convex
![Page 65: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/65.jpg)
Backpropagation algorithm
![Page 66: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/66.jpg)
Notation
![Page 67: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/67.jpg)
Notation
![Page 68: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/68.jpg)
Input
![Page 69: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/69.jpg)
Notation
![Page 70: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/70.jpg)
Cost function
![Page 71: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/71.jpg)
Cost function
![Page 72: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/72.jpg)
Gradient descent
![Page 73: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/73.jpg)
Gradient descent
![Page 74: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/74.jpg)
Gradient descent
![Page 75: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/75.jpg)
Backpropagation: Output layer
![Page 76: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/76.jpg)
Backpropagation: Hidden layers
![Page 77: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/77.jpg)
Backpropagation: Summary of
the algorithm
![Page 78: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/78.jpg)
Batch versus online learning
![Page 79: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/79.jpg)
Batch versus online learning
![Page 80: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/80.jpg)
Neural Nets
• Best performers on OCR
– http://yann.lecun.com/exdb/lenet/index.html
• NetTalk
– Text to Speech system from 1987
– http://youtu.be/tXMaFhO6dIY?t=45m15s
• Rick Rashid speaks Mandarin
– http://youtu.be/Nu-nlQqFCKg?t=7m30s
(C) Dhruv Batra 80
![Page 81: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/81.jpg)
Convergence of backprop
• Perceptron leads to convex optimization
– Gradient descent reaches global minima
• Multilayer neural nets not convex
– Gradient descent gets stuck in local minima
– Hard to set learning rate
– Selecting number of hidden units and layers = fuzzy
process
– NNs had fallen out of fashion in 90s, early 2000s
– Back with a new name and significantly improved
performance!!!!
• Deep networks
– Dropout and trained on much larger corpus
Slide Credit: Carlos Guestrin(C) Dhruv Batra 81
![Page 82: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/82.jpg)
Convolutional Nets
• Example:
– http://yann.lecun.com/exdb/lenet/index.html
(C) Dhruv Batra 82
INPUT 32x32
Convolutions SubsamplingConvolutions
C1: feature maps 6@28x28
Subsampling
S2: f. maps6@14x14
S4: f. maps 16@5x5
C5: layer120
C3: f. maps 16@10x10
F6: layer 84
Full connection
Full connection
Gaussian connections
OUTPUT 10
Image Credit: Yann LeCun, Kevin Murphy
![Page 83: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/83.jpg)
(C) Dhruv Batra 83Slide Credit: Marc'Aurelio Ranzato
![Page 84: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/84.jpg)
(C) Dhruv Batra 84Slide Credit: Marc'Aurelio Ranzato
![Page 85: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/85.jpg)
(C) Dhruv Batra 85Slide Credit: Marc'Aurelio Ranzato
![Page 86: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/86.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 86Figure Credit: [Zeiler & Fergus ECCV14]
![Page 87: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/87.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 87Figure Credit: [Zeiler & Fergus ECCV14]
![Page 88: VBM683 Machine Learning](https://reader033.vdocument.in/reader033/viewer/2022050512/6271d642a31f6f6a132bdd2e/html5/thumbnails/88.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 88Figure Credit: [Zeiler & Fergus ECCV14]