(artificial) neural networks: from perceptron to mlp · 2020-06-29 · artificial neural networks...
TRANSCRIPT
(Artificial) Neural Networks:From Perceptron to MLP
Industrial AI Lab.
Prof. Seungchul Lee
Perceptron: Example
2
Perceptron: Example
3
Perceptron: Example
4
Perceptron: Forward Propagation
5
From Perceptron to MLP
6
XOR Problem
• Minsky-Papert Controversy on XOR– Not linearly separable
– Limitation of perceptron
• Single neuron = one linear classification boundary
7
Artificial Neural Networks: MLP
• Multi-layer Perceptron (MLP) = Artificial Neural Networks (ANN)– Multi neurons = multiple linear classification boundaries
8
Artificial Neural Networks: Activation Function
• Differentiable nonlinear activation function
9
Artificial Neural Networks
• In a compact representation
10
Artificial Neural Networks
• A single layer is not enough to be able to represent complex relationship between input and output
⟹ perceptron with many layers and units
• Multi-layer perceptron– Features of features
– Mapping of mappings
11
Another Perspective:ANN as Kernel Learning
12
Neuron
• We can represent this “neuron” as follows:
13
XOR Problem
• The main weakness of linear predictors is their lack of capacity.
• For classification, the populations have to be linearly separable.
14
Nonlinear Mapping
• The XOR example can be solved by pre-processing the data to make the two populations linearly separable.
Source: Dr. Francois Fleuret at EPFL 15
Nonlinear Mapping
• The XOR example can be solved by pre-processing the data to make the two populations linearly separable.
Source: Dr. Francois Fleuret at EPFL 16
Nonlinear Mapping
• The XOR example can be solved by pre-processing the data to make the two populations linearly separable.
Source: Dr. Francois Fleuret at EPFL 17
Kernel
• Often we want to capture nonlinear patterns in the data
– nonlinear regression: input and output relationship may not be linear
– nonlinear classification: classes may note be separable by a linear boundary
• Linear models (e.g. linear regression, linear SVM) are not just rich enough
– by mapping data to higher dimensions where it exhibits linear patterns
– apply the linear model in the new input feature space
– mapping = changing the feature representation
• Kernels: make linear model work in nonlinear settings
18
Kernel + Neuron
• Nonlinear mapping + neuron
𝜙: Kernel Perceptron
19
Neuron + Neuron
• Nonlinear mapping can be represented by another neurons
• Nonlinear Kernel – Nonlinear activation functions
𝜙: Kernel
20
Multi Layer Perceptron
• Nonlinear mapping can be represented by another neurons
• We can generalize an MLP
𝜙: Kernel
21
Summary
• Universal function approximator
• Universal function classifier
• Parameterized
22
Deep Artificial Neural Networks
• Complex/Nonlinear universal function approximator
– Linearly connected networks
– Simple nonlinear neurons
Feature learning Classification
Class 1
Class 2
nonlinearlinear
…
OutputInput
23
Deep Artificial Neural Networks
• Complex/Nonlinear universal function approximator
– Linearly connected networks
– Simple nonlinear neurons
Class 1
Class 2
…… … ………
nonlinearlinear
Feature learning Classification
OutputInput
24
Looking at Parameters
25
Logistic Regression in a Form of Neural Network
26
Logistic Regression in a Form of Neural Network
• Neural network convention
Do not indicate bias units
27
Nonlinearly Distributed Data
Do not include bias units
• Example to understand network’s behavior– Include a hidden layer
28
Multi Layers
Do not include bias units
• 𝑧 space
29
Multi Layers
Do not include bias units
• 𝑥 space
30
(Artificial) Neural Networks:Training
Industrial AI
Prof. Seungchul Lee
Training Neural Networks: Loss Function
• Measures error between target values and predictions
• Example– Squared loss (for regression):
– Cross entropy (for classification):
32
Gradients in ANN
• Learning weights and biases from data using gradient descent
•𝜕ℓ
𝜕𝜔: too many computations are required for all 𝜔
• Structural constraint of NN: – Composition of functions
– Chain rule
– Dynamic programming
33
Training Neural Networks with TensorFlow
• Optimization procedure
• It is not easy to numerically compute gradients in network in general.– The good news: people have already done all the "hard work" of developing numerical solvers (or
libraries)
– There are a wide range of tools → We will use the TensorFlow
34
ANN in TensorFlow:MNIST
35
MNIST database
• Mixed National Institute of Standards and Technology database
• Handwritten digit database
• 28 × 28 gray scaled image
• Flattened matrix into a vector of 28 × 28 = 784
36
Our Network Model
Input layer(784)
hidden layer(100)
output layer(10)
Input image(28 X 28)
flatteneddigit prediction
in one-hot-encoding
37
Machine Learning and Deep Learning
Prof. Seungchul Lee
Industrial AI Lab.
38
Supervised Learning
…
Training Data
Chair
Dest
Generalization
39
Machine Learning
• Image of digit 0
• Image of digit 1
Image
40
이사?
간다
온다
“좋은 특성인자 선정이 중요하다.”
전문가 지식
Machine Learning Deep Learning
Data
Features
Decision
Data
Decision
• Less depends on domain knowledge• AI-discovered features
Signal processing + Domain knowledge“ 딥러닝은기계학습보다는
Domain Knowledge의존도가 낮다. ”
42
Deep Learning
• Convolutional Neural Networks (CNN)
• Image pattern recognition problems
Image
0
1
43
Deep Learning
• Convolutional Neural Networks (CNN)
• Image pattern recognition problems
Image
0
1
44
Handwritten Digit Recognition
Ind
ust
rial
AI L
ab.
45
AI Core vs. AI + X
46
인공지능과공학
• Generally correct in ideal cases
• Difficult to reflect uncertainties in reality
47