unsupervised learning

November 9, 2010 Neural Networks Lecture 16: Counterpropagation

1

Unsupervised Learning Unsupervised Learning So far, we have only looked at supervised learning, in So far, we have only looked at supervised learning, in which an external teacher improves network performance which an external teacher improves network performance by comparing desired and actual outputs and modifying by comparing desired and actual outputs and modifying the synaptic weights accordingly.the synaptic weights accordingly.

However, most of the learning that takes place in our However, most of the learning that takes place in our brains is completely unsupervised.brains is completely unsupervised.

This type of learning is aimed at achieving the most This type of learning is aimed at achieving the most efficient representation of the input space, regardless of efficient representation of the input space, regardless of any output space.any output space.

Unsupervised learning can also be useful in artificial neural Unsupervised learning can also be useful in artificial neural networks.networks.


2

Unsupervised Learning Unsupervised Learning Applications of unsupervised learning includeApplications of unsupervised learning include

•ClusteringClustering

•Vector quantizationVector quantization

•Data compressionData compression

•Feature extractionFeature extraction

Unsupervised learning methods can also be combined with Unsupervised learning methods can also be combined with supervised ones to enable learning through input-output supervised ones to enable learning through input-output pairs like in the BPN.pairs like in the BPN.

One such hybrid approach is the counterpropagation One such hybrid approach is the counterpropagation network.network.


3

Unsupervised/Supervised Learning: Unsupervised/Supervised Learning: The Counterpropagation NetworkThe Counterpropagation Network

The The counterpropagation networkcounterpropagation network (CPN) is a fast- (CPN) is a fast-learning combination of unsupervised and supervised learning combination of unsupervised and supervised learning. learning.

Although this network uses Although this network uses linearlinear neurons, it can neurons, it can learn learn nonlinearnonlinear functions by means of a hidden layer functions by means of a hidden layer of of competitive unitscompetitive units..

Moreover, the network is able to learn a function and Moreover, the network is able to learn a function and itsits inverse inverse at the same time. at the same time.

However, to simplify things, we will only consider the However, to simplify things, we will only consider the feedforwardfeedforward mechanism of the CPN. mechanism of the CPN.


4

Distance/Similarity FunctionsDistance/Similarity FunctionsIn the hidden layer, the neuron whose weight vector is In the hidden layer, the neuron whose weight vector is most similar to the current input vector is the “winner.”most similar to the current input vector is the “winner.”

There are different ways of defining such maximal There are different ways of defining such maximal similarity, for example:similarity, for example:

(1) Maximal cosine similarity (same as net input):(1) Maximal cosine similarity (same as net input):

(2) Minimal Euclidean distance:(2) Minimal Euclidean distance:

xwxw, )(s

i

ii xwd 2)( xw,

(no square root necessary for determining the winner)(no square root necessary for determining the winner)


5

The Counterpropagation NetworkThe Counterpropagation NetworkA simple CPN with two input neurons, three hidden neurons, A simple CPN with two input neurons, three hidden neurons, and two output neurons can be described as follows:and two output neurons can be described as follows:

Hw11

Hw12 Hw21Hw22

Hw31

Hw32

Ow11

Ow12

Ow13Ow21

Ow22

Ow23

XX11 XX22

Input Input layerlayer

HH11 HH22 HH33HiddenHiddenlayerlayer

YY11 YY22Output Output layerlayer


6

The Counterpropagation NetworkThe Counterpropagation Network

The CPN learning process (general form for n input The CPN learning process (general form for n input units and m output units):units and m output units):1.1.Randomly select a vector pair (x, y) from the training set.Randomly select a vector pair (x, y) from the training set.

2.2.If you use the cosine similarity function, normalize If you use the cosine similarity function, normalize (shrink/expand to “length” 1) the input vector x by dividing (shrink/expand to “length” 1) the input vector x by dividing every component of x by the magnitude ||x||, whereevery component of x by the magnitude ||x||, where

n

jjxx

1

2 ||||


7


3.3. Initialize the input neurons with the resulting vector Initialize the input neurons with the resulting vector and compute the activation of the hidden-layer and compute the activation of the hidden-layer units according to the chosen similarity measure.units according to the chosen similarity measure.

4.4. In the hidden (competitive) layer, determine the unit In the hidden (competitive) layer, determine the unit W with the largest activation (the winner).W with the largest activation (the winner).

5.5. Adjust the connection weights between W and all N Adjust the connection weights between W and all N input-layer units according to the formula: input-layer units according to the formula:

))(()()1( twxtwtw HWnn

HWn

HWn

6.6. Repeat steps 1 to 5 until all training patterns have Repeat steps 1 to 5 until all training patterns have been processed once.been processed once.


8


7.7. Repeat step 6 until each input pattern is Repeat step 6 until each input pattern is consistently associated with the same competitive consistently associated with the same competitive unit.unit.

8.8. Select the first vector pair in the training set (the Select the first vector pair in the training set (the current pattern).current pattern).

9.9. Repeat steps 2 to 4 (normalization, competition) Repeat steps 2 to 4 (normalization, competition) for the current pattern.for the current pattern.

10.10. Adjust the connection weights between the Adjust the connection weights between the winning hidden-layer unit and all M output layer winning hidden-layer unit and all M output layer units according to the equation:units according to the equation:

))(()()1( twytwtw OmWm

OmW

OmW


9


11.11. Repeat steps 9 and 10 for each vector pair in the Repeat steps 9 and 10 for each vector pair in the training set.training set.

12.12. Repeat steps 8 through 11 for several epochs.Repeat steps 8 through 11 for several epochs.


10


Because in our example network the input is two-Because in our example network the input is two-dimensional, each unit in the hidden layer has two dimensional, each unit in the hidden layer has two weights (one for each input connection).weights (one for each input connection).

Therefore, input to the network as well as weights of Therefore, input to the network as well as weights of hidden-layer units can be represented and visualized hidden-layer units can be represented and visualized by two-dimensional vectors.by two-dimensional vectors.

For the current network, all weights in the hidden layer For the current network, all weights in the hidden layer can be completely described by three 2D vectors.can be completely described by three 2D vectors.


11

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityThis diagram shows a sample state of the hidden layer and a sample input to the network:This diagram shows a sample state of the hidden layer and a sample input to the network:

),( 1211HH ww

),( 2221HH ww

),( 3231HH ww

),( 21 xx


12

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityIn this example, hidden-layer neuron HIn this example, hidden-layer neuron H22 wins and, according to the learning rule, is moved closer towards the current input wins and, according to the learning rule, is moved closer towards the current input vector.vector.

),( 1211HH ww

),( 2221HH ww

),( 3231HH ww

),( 21 xx

),( ,22,21Hnew

Hnew ww


13

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityAfter doing this through many epochs and slowly reducing the After doing this through many epochs and slowly reducing the adaptation step size adaptation step size , each hidden-layer unit will win for a , each hidden-layer unit will win for a subset of inputs, and the angle of its weight vector will be in the subset of inputs, and the angle of its weight vector will be in the center of gravity of the angles of these inputs.center of gravity of the angles of these inputs.

),( 1211HH ww

),( 2221HH ww

),( 3231HH ww

all input vectors all input vectors in the training setin the training set


14

Counterpropagation – Euclidean DistanceCounterpropagation – Euclidean DistanceExample of competitive learning with three hidden neurons:Example of competitive learning with three hidden neurons:

xxxxxx

xx

++++++

++++

oooooo

oo

33

11

22


15


xxxxxx

xx

++++++

++++

oooooo

oo

33

11

22


16


xxxxxx

xx

++++++

++++

oooooo

oo33

11

22


17


xxxxxx

xx

++++++

++++

oooooo

oo33

11

22


18


xxxxxx

xx

++++++

++++

oooooo

oo33

11

22


19


xxxxxx

xx

++++++

++++

oooooo

oo33

11

22


20


xxxxxx

xx

++++++

++++

oooooo

oo33

2211


21


xxxxxx

xx

++++++

++++

oooooo

oo33

2211


22


xxxxxx

xx

++++++

++++

oooooo

oo33

2211


23


xxxxxx

xx

++++++

++++

oooooo

oo33

2211


24


xxxxxx

xx

++++++

++++

oooooo

oo

2211

33


25


xxxxxx

xx

++++++

++++

oooooo

oo

2211

33


26


xxxxxx

xx

++++++

++++

oooooo

oo22

11

33


27

Counterpropagation – Euclidean DistanceCounterpropagation – Euclidean Distance

… … and so on, and so on,

possibly with reduction of the learning rate…possibly with reduction of the learning rate…


28


xxxxxx

xx

++++++

++++

oooooo

oo22

11

33


29

The Counterpropagation NetworkThe Counterpropagation NetworkAfter the After the first phasefirst phase of the training, each hidden-layer of the training, each hidden-layer neuron is associated with a subset of input vectors.neuron is associated with a subset of input vectors.The training process minimized the average angle The training process minimized the average angle difference or Euclidean distance between the weight difference or Euclidean distance between the weight vectors and their associated input vectors.vectors and their associated input vectors.In the In the second phasesecond phase of the training, we adjust the of the training, we adjust the weights in the network’s output layer in such a way weights in the network’s output layer in such a way that, for any winning hidden-layer unit, the network’s that, for any winning hidden-layer unit, the network’s output is as close as possible to the desired output for output is as close as possible to the desired output for the winning unit’s associated input vectors.the winning unit’s associated input vectors.The idea is that when we later use the network to The idea is that when we later use the network to compute functions, the output of the winning hidden-compute functions, the output of the winning hidden-layer unit is 1, and the output of all other hidden-layer layer unit is 1, and the output of all other hidden-layer units is 0.units is 0.


30

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityBecause there are two output neurons, the weights in the output layer that receive input from the same hidden-layer Because there are two output neurons, the weights in the output layer that receive input from the same hidden-layer unit can also be described by 2D vectors. These weight vectors are the only possible output vectors of our network.unit can also be described by 2D vectors. These weight vectors are the only possible output vectors of our network.

),( 2111OO ww

),( 2313OO ww

),( 2212OO ww

network output network output if Hif H11 wins wins




31

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityFor each input vector, the output-layer weights that are connected to the winning hidden-layer unit are made more For each input vector, the output-layer weights that are connected to the winning hidden-layer unit are made more similar to the desired output vector:similar to the desired output vector:

),( 2111OO ww

),( 2313OO ww

),( 2212OO ww

),( ,21,11Onew

Onew ww ),( 21 yy


32

Counterpropagation – Cosine SimilarityCounterpropagation – Cosine SimilarityThe training proceeds with decreasing step size The training proceeds with decreasing step size , and after its termination, the weight vectors are in the center of , and after its termination, the weight vectors are in the center of gravity of their associated output vectors:gravity of their associated output vectors:

),( 2111OO ww

),( 2313OO ww

),( 2212OO ww

Output associated withOutput associated with

HH11

HH22

HH33


33

Counterpropagation – Euclidean DistanceCounterpropagation – Euclidean DistanceAt the end of the output-layer learning process, the outputs of the network At the end of the output-layer learning process, the outputs of the network are at the center of gravity of the desired outputs of the winner neuron.are at the center of gravity of the desired outputs of the winner neuron.

xxxx

xx

xx

++++

++++

++

oooooo

oo22

11

33


34

The Counterpropagation NetworkThe Counterpropagation NetworkNotice:Notice:• In the first training phase, if a hidden-layer unit In the first training phase, if a hidden-layer unit does not windoes not win

for a long period of time, its weights should be set to for a long period of time, its weights should be set to randomrandom values to give that unit a chance to win subsequently.values to give that unit a chance to win subsequently.

• It is useful to It is useful to reduce the learning ratesreduce the learning rates , , during training. during training.

• There is There is no need for normalizingno need for normalizing the training output vectors. the training output vectors.

• After the training has finished, the network maps the training After the training has finished, the network maps the training inputs onto output vectors that are inputs onto output vectors that are close close to the desired ones.to the desired ones.

• The The moremore hidden units, the better the mapping; however, the hidden units, the better the mapping; however, the generalization ability may decrease.generalization ability may decrease.

• Thanks to the competitive neurons in the hidden layer, even Thanks to the competitive neurons in the hidden layer, even linear neurons can realize linear neurons can realize nonlinearnonlinear mappings. mappings.

unsupervised learning

Documents

winnerneural networks

whereneural networks

artificial neural networks

type of learning

input neurons

input vector x

input space

n input units