Download - Artificial Neural Network
![Page 1: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/1.jpg)
Artificial Neural Network
Yalong Li
Some slides are from http://www.cs.cmu.edu/~tom/10701_sp11/slides/NNets-701-3_24_2011_ann.pdf
![Page 2: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/2.jpg)
Structure• Motivation• Artificial neural networks• Learning: Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 3: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/3.jpg)
Some facts about our brain• Performance tends to degrade gracefully under partial damage• Learn (reorganize itself) from experience• Recovery from damage is possible• Performs massively parallel computations extremely efficiently• For example, complex visual perception occurs within less than 100 ms, that
is, 10 processing steps!(processing speed of synapses about 100hz)
• Supports our intelligence and self-awareness
![Page 4: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/4.jpg)
Neural Networks in the Brain• Cortex, midbrain, brainstem and cerebellum
• Visual System• 10 or 11 processing stages have been identified• feedforward
• earlier processing stages (near the sensory input) to later ones (near the motor output)• feedback
![Page 5: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/5.jpg)
Neurons and Synapses • Basic computational unit in the nervous system is the nerve cell,
or neuron.
![Page 6: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/6.jpg)
Synaptic Learning• One way brain learn is by altering the strengths of connections between neurons, and by adding or deleting connections between neurons • LTP(long-term potentiation)
• Long-Term Potentiation:• An enduring (>1 hour) increase in synaptic efficacy that results from high-frequency stimulation of an afferent (input) pathway
• The efficacy of a synapse can change as a result of experience, providing both memory and learning through long-term potentiation. One way this happens is through release of more neurotransmitter.
• Hebbs Postulate:• "When an axon of cell A... excites[s] cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change
takes place in one or both cells so that A's efficiency as one of the cells firing B is increased.“
• Points to note about LTP:• Synapses become more or less important over time (plasticity)• LTP is based on experience• LTP is based only on local information (Hebb's postulate)
![Page 7: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/7.jpg)
Brain
?
?
![Page 8: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/8.jpg)
Structure• Motivation• Artificial neural networks• Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 9: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/9.jpg)
Artificial Neural Networks to learn f: X • f might be non-linear function• X (vector of) continuous and/or discrete vars• Y (vector of) continuous and/or discrete vars
• Represent by network of logistic units• Each unit is a logistic function
unit output
• MLE: train weights of all units to minimize sum of squared errors of predicted network outputs• MAP: train to minimize sum of squared errors plus weight magnitudes
![Page 10: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/10.jpg)
Artificial Neural Networks to learn f: X
f: x-> y
• f(*) is:Nonlinear activation function, for classificationIdentity, for regression
• depends on parameters and then to allow these parameters to be adjusted, along with the coefficients {wj}
• Sigmoid function can be logistic or tanh
![Page 11: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/11.jpg)
Artificial Neural Networks to learn f: X
aj: activationsh(*): nonlinear function: activation function, determined by the nature of the data and the assumed distribution of target variables
![Page 12: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/12.jpg)
Artificial Neural Networks to learn f: X
How to define • for standard regression, is the identity:
yk = ak
• for multiple binary classification, each output unit activation is transformed using a logisticsigmoid function, so that:
yk =(ak)(a) = 1/(1+exp(-a))
• For multiclass problems, a softmax activation of the form:
![Page 13: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/13.jpg)
Artificial Neural Networks to learn f: X
Why is thatThere is a natural choice of both output unit activation function and matching error function, according to the type of problem being solved.
• Regression:linear outputs, Error = sum-of-squares error
• (Multiple independent)binary classifications:logistic sigmoid outputs, Error = cross-entropy error function
• Multiclass classification:softmax output, Error = multiclass cross-entropy error function
Two classes ?
Derivative of the error function with respect to the activation for a particularouput take the form:
A probablilistic interpretation to the network outputs is given in book PRML, M.Bishop.
![Page 14: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/14.jpg)
Multilayer Networks of Sigmoid Units
![Page 15: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/15.jpg)
Multilayer Networks of Sigmoid Units
![Page 16: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/16.jpg)
Connectionist ModelsConsider humans:
• Neuron switching time ~.001 second• Number of neurons ~1010fs• Connections per neuron ~104-5
• Scene recognition time ~.1 second• 100 inference steps doesn’t seem like enough→ Much parallel compution
Properties of artificial neural nets(ANN’s):• Many neuron-like threshold switching units• Many weighted interconnections among units• Highly parallel, distributed process
![Page 17: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/17.jpg)
Structure• Motivation• Artificial neural networks• Learning: Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 18: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/18.jpg)
Backpropagation Algorithm• Looks for the minium of the error function in weight space using the method of gradient descent.• The combination of weights which minimizes the error functionis considered to be a solution of the learning problem.
![Page 19: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/19.jpg)
Sigmoid unit
![Page 20: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/20.jpg)
Error Gradient for a Sigmoid Unit
![Page 21: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/21.jpg)
Gradient Descent
![Page 22: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/22.jpg)
Incremental(Stochastic) Gradient Descent
![Page 23: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/23.jpg)
Backpropagation Algorithm(MLE)
![Page 24: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/24.jpg)
Backpropagation Algorithm(MLE)Derivation of the BP rule:
Goal:
Error: Notation:
![Page 25: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/25.jpg)
Backpropagation Algorithm(MLE)For ouput unit j:
)( jj ot
−𝜎 𝑗
![Page 26: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/26.jpg)
Backpropagation Algorithm(MLE)For hidden unit j:
-
![Page 27: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/27.jpg)
More on Backpropagation
![Page 28: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/28.jpg)
Structure• Motivation• Artificial neural networks• Learning: Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 29: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/29.jpg)
Overfitting in ANNs
![Page 30: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/30.jpg)
Dealing with Overfitting
![Page 31: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/31.jpg)
Dealing with Overfitting
![Page 32: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/32.jpg)
K-Fold Cross Validation
![Page 33: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/33.jpg)
Leave-Out-One Cross Validation
![Page 34: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/34.jpg)
Structure• Motivation• Artificial neural networks• Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 35: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/35.jpg)
Expressive Capabilities of ANNs• Single Layer: Preceptron
• XOR problem
• 8-3-8 problem
![Page 36: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/36.jpg)
Single Layer: Perceptron
![Page 37: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/37.jpg)
Single Layer: Perceptron• Representational Power of Perceptrons hyperplane decision surface in the n-dimensional space of instances wx = 0
• Linear separable sets• Logical: and, or, …
• How to learn w ?
![Page 38: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/38.jpg)
Single Layer: Perceptron• Nonliear sets of examples?
![Page 39: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/39.jpg)
Multi-layer perceptron, XOR• k = y1 AND NOT y2
= (x1 OR x2) AND NOT (x1 AND X2) = x1 XOR x2
Boundary:x1 + x2 + 0.5 = 0x1 +x2 – 1.5 = 0
![Page 40: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/40.jpg)
Multi-layer perceptron
![Page 41: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/41.jpg)
Expressive Capabilities of ANNs
![Page 42: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/42.jpg)
Leaning Hidden Layer Representations• 8-3-8 problem
![Page 43: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/43.jpg)
Leaning Hidden Layer Representations• 8-3-8 problem
![Page 44: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/44.jpg)
Leaning Hidden Layer Representations• 8-3-8 problem
Auto Encoder?
![Page 45: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/45.jpg)
Training
![Page 46: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/46.jpg)
Training
![Page 47: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/47.jpg)
Training
![Page 48: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/48.jpg)
Neural Nets for Face Recognition
![Page 49: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/49.jpg)
Leaning Hidden Layer Representations
![Page 50: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/50.jpg)
Leaning Hidden Layer Representations
![Page 51: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/51.jpg)
Structure• Motivation• Artificial neural networks• Backpropagation Algorithm• Overfitting• Expressive Capabilities of ANNs• Summary
![Page 52: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/52.jpg)
Summary• Brain
• parallel computing• hierarchic network
• Artificial Neural Network• Mathematical expression• Activation function selection
• Gradient Descent and BP• Error back-propagation for hidden units
• Overfitting• Expressive capabilities of Anns
• Decision surface, function approximate, hidden layer
![Page 53: Artificial Neural Network](https://reader036.vdocument.in/reader036/viewer/2022081507/5681662d550346895dd993b8/html5/thumbnails/53.jpg)
Thank you!