neural networks eric postma ikat universiteit maastricht
TRANSCRIPT
Neural networksNeural networks
Eric PostmaEric Postma
IKATIKAT
Universiteit MaastrichtUniversiteit Maastricht
OverviewOverview
Introduction: The biology of neural networks• the biological computer
• brain-inspired models
• basic notions
Interactive neural-network demonstrations• Perceptron
• Multilayer perceptron
• Kohonen’s self-organising feature map
• Examples of applications
A typical AI agentA typical AI agent
Two types of learningTwo types of learning
• Supervised learningSupervised learning– curve fitting, surface fitting, ...curve fitting, surface fitting, ...
• Unsupervised learningUnsupervised learning– clustering, visualisation...clustering, visualisation...
An input-output functionAn input-output function
Fitting a surface to four pointsFitting a surface to four points
(Artificial) neural networks(Artificial) neural networks
The digital computer The digital computer versusversus
the neural computerthe neural computer
The Von Neumann architectureThe Von Neumann architecture
The biological architectureThe biological architecture
Digital versus biological computersDigital versus biological computers
5 distinguishing properties5 distinguishing properties• speedspeed• robustness robustness • flexibilityflexibility• adaptivityadaptivity• context-sensitivitycontext-sensitivity
Speed: Speed: The “hundred time steps” argumentThe “hundred time steps” argument
The critical resource that is most obvious is The critical resource that is most obvious is time. Neurons whose basic computational time. Neurons whose basic computational speed is a few milliseconds must be made to speed is a few milliseconds must be made to account for complex behaviors which are account for complex behaviors which are carried out in a few hudred milliseconds carried out in a few hudred milliseconds (Posner, 1978). This means that (Posner, 1978). This means that entire complex entire complex behaviors are carried out in less than a hundred behaviors are carried out in less than a hundred time steps.time steps.
Feldman and Ballard (1982)Feldman and Ballard (1982)
Flexibility: the Flexibility: the NeckerNecker cube cube
vision = constraint satisfactionvision = constraint satisfaction
AdaptivitiyAdaptivitiy
processing implies learningprocessing implies learning
in biological computers in biological computers
versus versus
processing does not imply learningprocessing does not imply learning
in digital computersin digital computers
Context-sensitivity: patternsContext-sensitivity: patterns
emergent propertiesemergent properties
Robustness and context-sensitivityRobustness and context-sensitivitycoping with noisecoping with noise
The neural computerThe neural computer
• Is it possible to develop a model after the Is it possible to develop a model after the natural example?natural example?
• Brain-inspired models:Brain-inspired models:– models based on a restricted set of structural en models based on a restricted set of structural en
functional properties of the (human) brainfunctional properties of the (human) brain
The Neural Computer (structure)The Neural Computer (structure)
Neurons, Neurons, the building blocks of the brainthe building blocks of the brain
Synapses,Synapses,the basis of learning and memory the basis of learning and memory
Learning:Learning: Hebb Hebb’s rule’s ruleneuron 1 synapse neuron 2
ConnectivityConnectivityAn example:An example:The visual system is a The visual system is a feedforward hierarchy of feedforward hierarchy of neural modules neural modules
Every module is (to a Every module is (to a certain extent) certain extent) responsible for a certain responsible for a certain functionfunction
(Artificial) (Artificial) Neural NetworksNeural Networks
• NeuronsNeurons– activityactivity– nonlinear input-output functionnonlinear input-output function
• Connections Connections – weightweight
• LearningLearning– supervisedsupervised– unsupervisedunsupervised
Artificial NeuronsArtificial Neurons
• input (vectors)input (vectors)• summation (excitation)summation (excitation)• output (activation)output (activation)
a = f(e)e
i1
i2
i3
Input-output functionInput-output function
• nonlinear function:nonlinear function:
e
f(e)
f(x) = 1 + e -x/a
1
a 0
a
Artificial Connections Artificial Connections (Synapses)(Synapses)
• wwABAB
– The weight of the connection from neuron The weight of the connection from neuron AA to to neuron neuron BB
A BwAB
The PerceptronThe Perceptron
Learning in the PerceptronLearning in the Perceptron• Delta learning ruleDelta learning rule
– the difference between the desired output the difference between the desired output ttand the actual output and the actual output oo, , given input given input xx
• Global error E Global error E – is a function of the differences between the is a function of the differences between the
desired and actual outputsdesired and actual outputs
Gradient DescentGradient Descent
Linear decision boundariesLinear decision boundaries
The history of the PerceptronThe history of the Perceptron
• Rosenblatt (1959)Rosenblatt (1959)
• Minsky & Papert (1961)Minsky & Papert (1961)
• Rumelhart & McClelland (1986)Rumelhart & McClelland (1986)
The multilayer perceptronThe multilayer perceptron
input hidden output
Training the MLPTraining the MLP
• supervised learningsupervised learning– each training pattern: input + desired output each training pattern: input + desired output – in each in each epochepoch: present all patterns : present all patterns – at each presentation: adapt weightsat each presentation: adapt weights– after many epochs convergence to a local minimumafter many epochs convergence to a local minimum
phoneme recognition with a MLPphoneme recognition with a MLP
input: frequencies
Output:pronunciation
Non-linear decision boundariesNon-linear decision boundaries
Compression with an MLPCompression with an MLPthe the autoencoderautoencoder
hidden representationhidden representation
Learning in the MLPLearning in the MLP
Preventing OverfittingPreventing Overfitting
GENERALISATION GENERALISATION = performance on test set= performance on test set
• Early stoppingEarly stopping• Training, Test, and Validation setTraining, Test, and Validation set• kk-fold cross validation-fold cross validation
– leaving-one-out procedureleaving-one-out procedure
Image Recognition with the MLPImage Recognition with the MLP
Hidden RepresentationsHidden Representations
Other ApplicationsOther Applications
• PracticalPractical– OCROCR– financial time seriesfinancial time series– fraud detectionfraud detection– process controlprocess control– marketingmarketing– speech recognitionspeech recognition
• TheoreticalTheoretical– cognitive modelingcognitive modeling– biological modelingbiological modeling
Some mathematics…Some mathematics…
PerceptronPerceptron
Derivation of the delta learning ruleDerivation of the delta learning rule
Target output
Actual output
h = i
Sigmoid functionSigmoid function
• May also be theMay also be the tanhtanh functionfunction – (<-1,+1> (<-1,+1> instead of instead of <0,1>)<0,1>)
• DerivativeDerivative f’(x) = f(x) [1 – f(x)] f’(x) = f(x) [1 – f(x)]
Derivation generalized delta ruleDerivation generalized delta rule
Error funError functionction (LMS) (LMS)
AdaptationAdaptation hidden-output hidden-output weightsweights
AAdaptationdaptation input-hidden input-hidden weightsweights
Forward Forward andand Backward Propagation Backward Propagation
Decision boundaries of PerceptronsDecision boundaries of Perceptrons
Straight lines (surfaces), linear separable
Decision boundaries of MLPsDecision boundaries of MLPs
Convex areas (open or closed)
Decision boundaries of MLPs Decision boundaries of MLPs
Combinations of convex areas
Learning and representing Learning and representing similaritysimilarity
Alternative conception of neuronsAlternative conception of neurons
• Neurons do not take the weighted sum of their Neurons do not take the weighted sum of their inputs (as in the perceptron), but measure the inputs (as in the perceptron), but measure the similarity of the weight vector to the input similarity of the weight vector to the input vectorvector
• The activation of the neuron is a measure of The activation of the neuron is a measure of similarity. The more similar the weight is to the similarity. The more similar the weight is to the input, the higher the activationinput, the higher the activation
• Neurons represent “prototypes”Neurons represent “prototypes”
Course CodingCourse Coding
22nd ordernd order isomor isomorphismphism
Prototypes forPrototypes for preprocessing preprocessing
Kohonen’s SOFMKohonen’s SOFM(Self Organizing Feature Map)(Self Organizing Feature Map)
• Unsupervised learningUnsupervised learning• Competitive learningCompetitive learning
output
input (n-dimensional)
winner
Competitive learningCompetitive learning
• Determine the winner (the neuron of which Determine the winner (the neuron of which the weight vector has the smallest distance the weight vector has the smallest distance to the input vector)to the input vector)
• Move the weight vector Move the weight vector ww of the winning of the winning neuron towards the input neuron towards the input ii
Before learning
i
w
After learning
i w
Kohonen’s ideaKohonen’s idea
• Impose a topological order onto the Impose a topological order onto the competitive neurons (e.g., rectangular map)competitive neurons (e.g., rectangular map)
• Let neighbours of the winner share the Let neighbours of the winner share the “prize” (The “postcode lottery” principle.)“prize” (The “postcode lottery” principle.)
• After learning, neurons with similar weights After learning, neurons with similar weights tend to cluster on the maptend to cluster on the map
Topological orderTopological order
neighbourhoodsneighbourhoods• SquareSquare
– winner (red)winner (red)– Nearest neighboursNearest neighbours
• HexagonalHexagonal– Winner (red)Winner (red)– Nearest neighboursNearest neighbours
A simple exampleA simple example
• A topological map of 2 x 3 neurons A topological map of 2 x 3 neurons and two inputsand two inputs
2D input
input
weights
visualisation
Weights before trainingWeights before training
Input patterns Input patterns (note the 2D distribution)(note the 2D distribution)
Weights after trainingWeights after training
Another exampleAnother example
• Input: uniformly randomly distributed pointsInput: uniformly randomly distributed points
• Output: Map of 20Output: Map of 2022 neurons neurons
• TrainingTraining– Starting with a large learning rate and Starting with a large learning rate and
neighbourhood size, both are gradually decreased neighbourhood size, both are gradually decreased to facilitate convergenceto facilitate convergence
Dimension reductionDimension reduction
Adaptive resolutionAdaptive resolution
Application of SOFMApplication of SOFM
Examples (input) SOFM after training (output)
Visual features (biologically plausible)Visual features (biologically plausible)
• Principal Components Analysis (PCA)Principal Components Analysis (PCA)
pca1pca2
pca1
pca2
Projections of data
Relation with statistical methods 1Relation with statistical methods 1
Relation with statistical methods 2Relation with statistical methods 2• Multi-Dimensional Scaling (MDS)Multi-Dimensional Scaling (MDS)• Sammon MappingSammon Mapping
Distances in high-dimensional space
Image MiningImage Miningthe right featurethe right feature
Fractal dimension in artFractal dimension in art
Jackson Pollock (Jack the Dripper)
Taylor, Micolich, and Jonas (1999). Fractal Analysis of Pollock’s drip Taylor, Micolich, and Jonas (1999). Fractal Analysis of Pollock’s drip paintings. paintings. NatureNature, 399, 422. (3 june)., 399, 422. (3 june).
Creation date
Fra
cta
l d
imen
sio
n
} Range for natural images
Our Van Gogh researchOur Van Gogh research
Two paintersTwo painters
• Vincent Van GoghVincent Van Gogh paints Van Gogh paints Van Gogh
• Claude-Emile SchuffeneckerClaude-Emile Schuffenecker paints Van Gogh paints Van Gogh
SunflowersSunflowers• Is it made byIs it made by
– Van Gogh?Van Gogh?
– Schuffenecker?Schuffenecker?
ApproachApproach
• Select appropriate features (skipped here, but Select appropriate features (skipped here, but very important!)very important!)
• Apply neural networksApply neural networks
van Goghvan Gogh Schuffenecker Schuffenecker
Training DataTraining Data
Van Gogh (5000 textures)Van Gogh (5000 textures) SchuffeneckerSchuffenecker (5000 textures)(5000 textures)
ResultsResults
• Generalisation performanceGeneralisation performance
• 96% correct classification on untrained data96% correct classification on untrained data
Resultats, cont.Resultats, cont.
• Trained art-expert Trained art-expert network applied to network applied to Yasuda sunflowersYasuda sunflowers
• 89% of the textures is 89% of the textures is geclassificeerd as a geclassificeerd as a genuine Van Goghgenuine Van Gogh
A major caveat…A major caveat…
• Not only the painters are Not only the painters are different…different…
• ……but also the materialbut also the material
and maybe many other things…and maybe many other things…