supervised learning in neural networks - watanabe lab.watanabe- · in image analysis, a network is...
TRANSCRIPT
![Page 1: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/1.jpg)
Supervised Learning in Neural Networks
Sumio WatanabeTokyo Institute of Technology
Advanced Topics in Mathematical Information Sciences II
April 24, May 1, 2015
![Page 2: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/2.jpg)
Quick Review
![Page 3: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/3.jpg)
2015/5/1 Mathematical Learning Theory 3
Supervised Learning
Y=f(x,w)
…
X1, X2, …, Xn
Y1, Y2, …, Yn
Supervisor
Learner
Samples
![Page 4: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/4.jpg)
Mathematics of Supervised Learning
Training Samples
X1, X2, …, Xn
Y1, Y2, …, Yn
Test Samples
X
Y
TrueInformationSource
NeuralNetwork
y=f(x,w) q(x,y)
![Page 5: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/5.jpg)
x1 x2 x3 xN
w1 w2 w3 wN
∑ wi xiN
i=1
σ( ∑ wi xi + θ)N
i=1
Neuron
Synapse weight
biasθ
One Neuron Model
Output
Input
![Page 6: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/6.jpg)
Three-Layered Neural Network
x1 x2 xM
f1 f2 fN
Output Layer
Hidden Layer
Input Layer
![Page 7: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/7.jpg)
1. Deep neural network
2. Sequential learning and auto-encoder
3. Convolution learning
Contents
![Page 8: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/8.jpg)
Deep neural network (DNN)
Recently, neural networks which havedeep layers are being studied.
It is reported that DNNs have bettergeneralization performance.
f1 f2 fN
x1 x2 xM
2015
1960
x1 x2 xM
f1 f2 fN
1985
x1 x2 xM
f1 f2 fN
![Page 9: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/9.jpg)
DefinitionIt is easy to define a deep network.
fi =σ( ∑uij xj + θj )H
j=1Simpleperceptron
M
k=1fi =σ( ∑uij σ( ∑ wjkxk + θj) + φi)
H
j=1
Three-layerNeural network
H2
k=1fi =σ( ∑uij σ( ∑ wjk σ( ∑ vkl ( ……. )l+ θj) + φi)
H1
j=1
M
l=1DNN
![Page 10: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/10.jpg)
Learning and Generalization
E(w) = (1/n) Σ (Yi-f(Xi,w))2n
i=1
Training Error
G(w) = ∫∫ (y-f(x,w))2 q(x,y) dxdy
Generalization Error
The main purpose of learning is to minimize G(w), but we haveonly training samples. Minimizing E(w) is not equivalent to minimizing G(w).
![Page 11: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/11.jpg)
Steepest Descent : Error back-propagation
E(w) = ― ∑(fi -yi )2N
i=1
12
oj=σ(∑wjkok+θj)M
k=1fi =σ(∑uijoj+φi)
H
j=1Inference
∂E∂wjk
= ∑ (fi -yi ) ∂ fi∂wjk
N
i=1
∂ fi∂wjk
= ∂ fi∂oj
∂ oj
∂wjk
All parameters can be optimized by steepest descent of E(w) by
Square error
recursive
![Page 12: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/12.jpg)
2015/5/1 Mathematical Learning Theory
Regularization : Ridge and Lasso
E(w) = (1/n) Σ (Yi-f(Xi,w))2 + R(w)n
i=1
Ridge R(w) = λ Σ |wj|2
Lasso R(w)= λ Σ |wj|
λ>0 : Hyperparameter
DNN has many parameters which are to be optimized.Regularization terms are necessary.
Remark. It is still difficult to find the optimal hyperparameter.
![Page 13: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/13.jpg)
Steepest Descent ?
Minimize this error by optimizing all parameters.
f1 f2 fN
x1 x2 xM
Supervised data
Outputs are far from inputs.Mathematically speaking, all parameters can be optimized by steepest descent, but it is difficult for a neural network to find the nonlinear relation between distant inputs and outputs.
input
output
We need methodology to build a deep neural network.
![Page 14: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/14.jpg)
1. Deep neural network
2. Sequential learning and auto-encoder
3. Convolution learning
Contents
![Page 15: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/15.jpg)
2015/5/1 Mathematical Learning Theory
Deep Learning Methodology
(2) Auto-encoder
(1) Sequential layer learning
(3) Convolution network
Three methods are being studied.
![Page 16: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/16.jpg)
(1) Sequential Layer Learning
x1 x2 xM
f1 f2 fN
x1 x2 xM
f1 f2 fN
x1 x2 xM
f1 f2 fN
Synapse weights in lower layers are copied from a trained shallow network to deeper one.
copy copy
SupervisorSupervisor
Supervisor
![Page 17: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/17.jpg)
2015/5/1 Mathematical Learning Theory
Parameter SpaceE(w)
W is in the higher dimensional Euclidean Space
Many local and complicated structure
E(w) is minimized at |w|=infinity.We need an appropriatefinite and local parameter.
Sequential layer learning maylead the training result to some appropriate point.
![Page 18: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/18.jpg)
2015/5/1 Mathematical Learning Theory
(2) Auto-encoder
X1 X2 XM
f1 f2 fN
Firstly, a bottleneck network is trained,then its weight is copied.
Smaller than M
X1 X2 XM
X1 X2 XM
SupervisorInput
Input
![Page 19: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/19.jpg)
2015/5/1 Mathematical Learning Theory
Bottleneck neural network
X1 X2 XM
If a set of inputs are on the K dimensionalmanifold in M dimensional Euclidean space, then its essential coordinates can be extracted automatically.
= Nonlinear Principal Component Analysis
X1 X2 XMK dimensional
manifold Input
Input
M dimensionalEuclidean space
Same
![Page 20: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/20.jpg)
ExampleInput 5 * 5Training samples 2000Testining samples 2000
0
6 image
Input 25
A network
Hidden 8
0 6
Output 2
Hidden 6
Hidden 4
![Page 21: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/21.jpg)
2015/5/1 Mathematical Learning Theory
(0) Only Error backpropagation
mean 213.5
mean 265.5
std 414.7
std 388.0
Training Error
Generalization Error
Training results strongly depend on initial synapse weights.
![Page 22: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/22.jpg)
2015/5/1 Mathematical Learning Theory
(1) Sequential Layer Learning
Training error Testing errorMean 4.1Std 1.8
mean 61.6std 7.0
![Page 23: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/23.jpg)
2015/5/1 Mathematical Learning Theory
(2) Auto-encoder
Std 3.4Mean 61.3Std 8.1
mean 5.3Training Error Test Error
![Page 24: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/24.jpg)
1. Deep neural network
2. Sequential learning and auto-encoder
3. Convolution learning
Contents
![Page 25: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/25.jpg)
2015/5/1 Mathematical Learning Theory
Data structureIn several data such as images or time series,neighborhoods have local covariance.
Image: a pixel depends on its neighbors.
Time series: a future value can be predicted by the past.
Convolutional network is useful to analyze such data.
fi =σ( ∑uij σ( ∑ wjkxk + θj) + φi)|i-j|<3 |j-k|<3
Synapse weights outside of neighbors are zero.
![Page 26: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/26.jpg)
2015/5/1 Mathematical Learning Theory
Convolutional network
In image analysis,a network is made by nonlinear convolution processingfrom local to global.
LocalInformation
Globalinformation
![Page 27: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/27.jpg)
2015/5/1 Mathematical Learning Theory
Multi-resolution Analysis
Multi-resolution analysis (MRA)
is a method of analyzing images by integration from local to global data.
Convolution network can beunderstood as a kind ofMRA.
![Page 28: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/28.jpg)
2015/5/1Mathematical Learning Theory
Time Delay Neural Network
Human’s speech containslocal abbreviation, expansion, and contraction.
A layered neural network wasproposed so as to be adapted to such local nonlinear changing.
This is called TDNN.
speech sound
timeRecognitionresult
time
![Page 29: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/29.jpg)
Example: Time series
x(t) = f(x(t-1),x(t-2),…,x(t-27)) + noise,
Time series prediction problem : how to find a nonlinear function
where { x(t) } is the set of prices of Hakusai (Japanese vegetable like cabbage) of each month 1970 – 2013.
Before processing, a linear prediction was optimized
x(t) = a1 x(t-1) + a2x(t-2) +・・・+ a27 x(t-27).
Linear prediction: Generalization Error 1.55Training Error 1.29
2015/5/1
![Page 30: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/30.jpg)
2015/5/1 Mathematical Learning Theory
Example
Month
Month
Price
Price
Training resultRed: TrueBlue: prediction
The data in e-stats of Japanese Government are used. http://www.e-stat.go.jp/SG1/estat/eStatTopPortal.do
Test resultRed: TrueBlue: Prediction
![Page 31: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/31.jpg)
2015/5/1 Mathematical Learning Theory
Comparison of DNN and ConvNN
A deep neural Network Convolution Network
TimeTime
Generalization Error 1.56Training Error 1.01
Generalization Error 1.35Training Error 1.28
![Page 32: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/32.jpg)
2015/5/1 Mathematical Learning Theory
Deep Learning and Feature extraction
(1) Automatic extraction of feature
(2) Preparing feature by human
By using the deep neural network, the optimal feature representation may be found. Discovery of unknown structure enables us to “mine data”. However, it may be difficult or if it is possible, it needs heavy computational costs.
If an appropriate feature is found by human before training, then computational cost in learning can be reduced.However, discovery of unknown feature does not occur.
![Page 33: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural](https://reader036.vdocument.in/reader036/viewer/2022062402/5ace76d27f8b9a71028b6b9d/html5/thumbnails/33.jpg)
2015/5/1 Mathematical Learning Theory
Summary
(1) Supervised learning in neural networks is introduced. (April,24th)
(2) Methodology of deep neural network (May, 1st)
(a) Definitions of Training and Generalization Errors
(b) Steepest Descent as an learning algorithm
(a) Sequential layer learning
(b) Auto-encoder
(c) Convolution network