EE1411
Universal Learning Universal Learning ModelsModels
Janusz A. Starzyk
Computational IntelligenceComputational Intelligence
Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław DuchUniwersytet Mikołaja Kopernika
EE1412
Task learningTask learningWe want to combine Hebbian learning and learning using error correction, hidden units and biologically justified models.
Hebbian networks model states of the world but not perception-action.
Error correction can learn mapping. Unfortunately the delta rule is only good for output units, and not hidden units, because it has to be given a goal.
Backpropagation of errors can teach hidden units.But there is no good biological justification for this method…
The idea of backpropagation is simple but a detailed algorithms requires many calculations.
Main idea: we're looking for the minimum error function, measuring the difference between the desired behavior and the behavior realized by the network.
EE1413
Error functionError functionE(w) – error function, dependent on all parameters of network w, is the sum of errors E(X;w) for all images X. ok(X;w) – values reached on output nr. k network for image X. tk(X;w) – values desired on output nr. k network for image X.
One image X, one parameter w then: 2; ;E w t o w X X X
Error value f. =0 is not always attainable, the network may not have enough parameters to learn the desired behavior, we can only aim for the smallest error.
In the minimum error E(X;w) is for parameter w for derivative dE(X;w)/dw = 0.
For many parameters we have all derivativesdE/dwi, or gradient.
EE1414
Error propagationError propagationThe delta rule minimizes error for one neuron, e.g.. the output neuron, which is reached by signals si
wik = ||tk – ok|| si
What signals should we take for hidden neurons? First we let signals into the network calculating activation , output signals from neurons h, through all layers, to the outputs ok (forward step).We calculate the errors k = (tk-ok), and corrections for the output neurons wik = k hi. Error for hidden neurons: j = k wjk k hj(1-hj), (backward step)
(backpropagation of error). The strongest correction for undecided weights – near 0.5
EE1415
GeneRecGeneRecAlthough most models used in psychology teach multilayer perceptron structures with the help of variations of backpropagation (in this way one can learn any function) the idea of transferring information about errors doesn't have a biological justification.
GeneRec (General Recirculation, O’Reilly 1996), Bi-directional signal propagation, asymmetrical weights wkl wjk.
First phase –, response of the network to the activation of x– gives output y–, then observation of the desired result y+ and propagation to input x+. The change in weights requires information about signals from both phases.
EE1416
GeneRec - learningGeneRec - learningThe learning rule agrees with the delta rule:
In comparison with backpropagation the difference of signals [y+-y-] replaces the aggregate error, (the difference of signals) ~ (the difference of activations) * (the derivative of the activation function), thus it is a gradient rule.
For setups is xi=1, so:
ij j j iw y y x
j j jy y Bi-directional information transfer is almost simultaneous, answers for the formation of attractor states, constraint satisfaction, image completion.
The P300 wave which appears 300 msec after activation shows expectations resulting from external activationErrors are the result of activity in the whole network, we will get slightly better results taking the average [x++x-]/2 and retaining the weight symmetry:
ij i j i jw x y x y CHL rule (Contrastive Hebbian Rule)
EE1417
Two phasesTwo phasesFrom where does the error come for correction of synaptic connections?
The layer on the right side = the middle after time t+1; e.g.. a) word pronunciation: external action correction; b) external expectations and someone's pronunciation; c) awaiting results of action and their observation; d) reconstruction (awaiting input).
EE1418
GeneRec propertiesGeneRec propertiesHebbian learning creates a model of the world, remembering correlations, but it is not capable of learning task execution.
Hidden layers allow for the transformation of a problem and error correction permits learning of difficult task execution, the relationships of inputs and outputs.
The combination of Hebbian learning – correlations (x y) – and error-based learning can learn everything in a biologically correct manner: CHL leads to symmetry, an approximate symmetry will suffice, connections are generally bidirectional. Err = CHL in the table.
Lack of Ca2+ = there is no learning; little Ca2+ = LTD, much Ca2+ = LTPLTD – unfulfilled expectations, only phase -, lack of z + reinforcement.
*
* *
*
EE1419
Combination of Hebb + errorsCombination of Hebb + errors
Advantages Disadvantages
Hebb
(Local)
Autonomic narrow
Reliable greedy
Error
(Remote)
Purposeful interdependent
Cooperative lazy
It's good to combine Hebbian learning and CHL error correction
CHL is like socialism tries to correct errors of the
whole, limits unit motivation, common responsibility low effectiveness planed activity
Hebbian learning is like capitalism based on greed local interests individualism efficacy of activity lack of monitoring the whole
EE14110
Combination of Hebb + errorsCombination of Hebb + errorsIt's good to combine Hebbian learning and CHL error correction
Correlations and errors:
Combination
Additionally, inhibition within layers is necessary: it creates economical internal representations, units compete with each other, only the best remain, specialized, makes possible self-organized learning.
EE14111
Simulation of a difficult problemSimulation of a difficult problemGenrec.proj.gz, chapt. 5.93 hidden units.Learning is interrupted after 5 epochs without error.
Errors during learning show substantial fluctuations – networks with recurrence are sensitive to small changes in weight, explore different solutions. Compare with learning easy and difficult tasks using only Hebb.
EE14112
Inhibitory competition as a constraintInhibitory competition as a constraintInhibition
Leads to sparse distributed representations (many representations, only some are useful in a concrete situation)
Competition and specialization: survival of the best adapted
Self-organized learning
Often more important than Hebbian
Inhibition was also used in the mixture of experts framework
gating units are subject to WTA competition control outputs of the experts
EE14113
Comparison of weight change in learningComparison of weight change in learning
View of hidden layer weights in Hebbian learning
Neural weights are introduced in reference to particular inputs
View of hidden layer weights in error correction learning
The weights seem fairly random when compared with Hebbian learning
EE14114
Comparison of weight change in learningComparison of weight change in learning
Charts comparing a) training errors b) number of cycles as functions of the number of training epochs for three different learning methods
Hebbian (Pure Hebb) Error correction (Pure Err) Combination (Hebb& Err) – which attained the best results
Epochs
b)
EE14115
Full Leabra modelFull Leabra model
Inhibition within layers, Hebbian learning + error correction for weights between layers.
6 principles of intelligent system construction.
1. Biological realism2. Distributed representations3. Inhibitory competition4. Bidirectional
Activation Propagation1. Error-driven learning2. Hebbian learning
EE14116
GeneralizationGeneralizationHow do we deal with things which we've never seen
every time we enter the classroom, every meeting, every sentence that you hear, etc.
We always encounter new situations, and we reasonably generalize them
How do we do this?
nust
EE14117
Good representationsGood representations
Internal distributed representations. New concepts are combinations of existing properties.
Hebbian learning + competition based on inhibition limit error correction so as to create good representations.
EE14118
Generalization in attractor networksGeneralization in attractor networksThe GeneRec rule itself doesn't lead to good generalization. Simulations: model_and_task.proj. gz, Chapt. 6
The Hebb parameter controls how much CHL and how much Hebb.
Pure_err realizes only CHL, check phases - and +
Compare internal representations for different types of learning.
EE14119
Deep networksDeep networksTo learn difficult problems, many transformations are necessary, strongly changing the representation of the problem.
Error signals become weak and learning is difficult.
We must add limits and self-organizing learning.
Analogy: Balancing several connected sticks is difficult, but adding self-organizing learning between fragments will simplify this significantly – like adding a gyroscope to each element.
EE14120
Sequential learningSequential learningExcept for object and relationship recognition and task execution, sequential learning is important, eg. the sequence of words in the sentences:
The dog bit the man. The man bit the dog.
The child lifted up the toy.
I drove through the intersection because the car on the right was just approaching.
The meaning of words, gestures, behaviors, depends on the sequence, the context.
Time plays a fundamental role: the consequences of the appearance of image X may be visible only with a delay, eg. the consequences of the position of figures during a game are only evident after a few turns.
Network models react immediately – how do brains do this?
EE14121
Family treeFamily treeExample simulation: family_trees.proj.gz, Chapt. 6.4.1
What is still missing? Temporal and sequential relationships!
EE14122
Sequential learningSequential learning
Cluster plot showing the representation of hidden layer neurons a) before learning b) after learning using a combined Hebbian and error-correction
method
The trained network has two branches corresponding to two families
EE14123
Sequential learningSequential learningCategories of temporal relationships:
Sequences with a given structure
Delayed in time Continuous trajectories
The context is represented in the frontal lobes of the cortex
it should affect the hidden layer.We need recurrent networks, which can hold onto context information for a period of time. Simple Recurrent Network, SRN,
The context layer is a copy of the hidden layerElman network.
EE14124
Sequential learningSequential learningBiological justification for context representationFrontal lobes of the cortex
Responsible for planning and performing temporal activities. People with damaged frontal lobes have trouble performing
the sequence of an activity even though they have no problem with the individual steps of the activity
Frontal lobes are responsible for temporal representations For example words such as “fly” or “pole” acquire
meanings based on the context Context is a function of previously acquired information
People with schizophrenia can use context directly before an ambiguous word but not context from a previous sentence.
Context representations not only lead to sequential behavior but are also necessary for understanding sequentially presented information such as speech.
EE14125
Examples of sequential learningExamples of sequential learningCan we discover rules of sequence creation? Examples:
BTXSEBPVPSEBTSXXTVVEBPTVPSE
A machine with consecutive passages produces these behaviors:
Are these sequences acceptable?
BTXXTTVVETSXSEVVSXEBSSXSE
As studies have shown, people can learn more quickly to recognize letters produced according to a specific pattern, even if they don't know the rules being used
EE14126
Network realizationNetwork realization
The network randomly chooses one of two possible states.
Hidden/contextual neurons learn to recognize machine states, not only labels.
Behavior modeling: the same observations but different internal states => different decisions and next states.
Project fsa.proj.gz, chapt. 6.6.3
EE14127
Temporal delay and reinforcement Temporal delay and reinforcement The reward (reinforcement) often follows with a delay eg. learning a game, behavioral strategies.
Idea: we have to foresee sufficiently earlywhat events lead to a reward. This is done by the temporal differences algorithm.(Temporal Differences TD - Sutton).From where does a reward come in the brain?
The midbrain dopaminergic system modulates the activity of the basal ganglia (BG) through the substantia nigra (SN), and the frontal cortex through the ventral tegmental area (VTA). It's a rather complicated system, whose actions are related to the evaluation of impulses/actions from the point of view of value and reward.
EE14128
Temporal delay and reinforcementTemporal delay and reinforcement
The ventral tegmental area (VTA) is part of the reward system.
VTA neurons deliver the neurotransmitter dopamine (DA) to the frontal lobes and the basal ganglia modulating learning in this area responsible for planning and action.
More advanced regions of the brain are responsible for producing this global learning signal
Studies of patients with damage in the VTA area indicate its role in predicting reward and punishment.
EE14129
Anticipation of reward and resultAnticipation of reward and result Anticipation of reward and resultAnticipation of reward and result
Anticipation of reward and reaction on the decision (Knutson et al, 2001)
EE14130
Basal gangliaBasal ganglia BGBGVTA neurons first learn to react to reward and then to predict ahead of time the appearance of a reward.
EE14131
Formulation sketchFormulation sketch –TD algorithm –TD algorithmWe need to determine a value function, the sum after all future rewards, the further away in time the less important:
The adaptive critic AC learns how to estimate the value function V(t).At every point in time, AC tries to predict the value of the reward
This can be done recursively:
Error of the predicted reward:
The network tries to reduce this error.The name of the algorithm – TD (temporal difference) represents the error in the calculation of the value function during a period of time
EE14132
Network implementationNetwork implementationPrediction of activity and error.
Conditioned stimulusCS for t=2Unconditioned stimulus(reward)US for t=16rl_cond.proj.gz
Initially large error for Time=16 because the reward r(16) is unexpected
Adaptive critic AC
EE14133
Two-phase implementationTwo-phase implementation(Phase +) computes the expected size of the reward over time t+1 (value r).
(Phase –) in step t-k predicts t-k+1, at the end r(tk).
The function value V(t+1) in phase + is carried over to value V(t) in phase -
Learning progresses backwards in time affecting the value of the previous step
CS for t=2
US for t=16
)1(ˆ1
)1(ˆ tVtV
EE14134
Two-phase implementationTwo-phase implementationThe system learns that stimulants (tone) predicts the reward
Input CSC – Complete Serial Compound, uses unique elements for each stimulus for each point in time.
This is not a very realistic model of classical conditioning.
Chapt. 6.7.3, proj. rl_cond.proj.gz