self-organizing feature maps week 7cosc.brocku.ca/~efoxwell/4p80/week7.pdf ·...

COSC 4P80

Self-Organizing Feature Maps

Week 7

Brock University(Material attributed to Rojas, 1996)

Brock University(Material attributed to Rojas, 1996) (Week 7)Self-Organizing Feature Maps 1 / 17

Recall clustering...

A category of unsupervised learning techniques that, after training onpatterns, can then arrange additional patterns.

Patterns that are more similar — according to some metric — aremore likely to be grouped together than fundamentally dissimilarpatterns

Effectively, like with like

We would prefer to see topologies preserved: clusters with mostlysimilar characteristics could also be close to each other

◮ Though we haven’t really addressed how to accomplish this just yet...


Consider a problem

We’ve already discussed the idea of taking a large sampling ofrandomly-selected colours, and grouping them together into whateverarrangement the algorithm might choose

e.g. for K-Means Clustering

Maybe we could do an incredibly brief pseudo-example on the board?


Colours are prettyBut how do we handle similarities?

Let’s take it one step further, and consider topologies

That is, not only assigning the patterns to nodes/vectors, but alsoarranging the vectors to nudge adjacent nodes towards preservingrelative positioning, etc.


ExampleColours

Consider the following colours, corresponding to different neurons/vectors

That is, these are the network; not the input patterns


ExampleColours (cont)

Through some form of wizardry, we’d like to see something like thishappen:


ExampleColours (additional)

Note: some settings were tweaked there to show the progression on aprojector. This would actually probably be a better result:


Topologies

I keep saying “Like with Like”, but what does that mean?

It means that we’re actually preserving the mathematical similaritieswhen mapping a set of data into another field

◮ Irrespective of whether we’re maintaining or reducing the number ofdimensions

For example:

Credit: http://commons.wikimedia.org/wiki/User:Mcld


Our Approach

Our algorithm will:

Have a ‘network’ that’s mostly just a collection of vectors mappingfrom inputs

Use its vectors as exemplars reflecting their correspondingclusters/nodes

For a given pattern, choose the vector that most closely resemblesthat pattern

Update its exemplars to more better reflect the cluster’s members

Wait... isn’t that just K-Means clustering?!?


Our ApproachThe difference

No! Because now we’re going to introduce one more feature: aneighbourhood!

Remember that we’d like to incorporate some notion of topologies

We achieve this by, during the training phase, acknowledging all ofthe other nodes that might have been chosen

◮ e.g. adjacent nodes, or even just nodes that were reasonably close

In addition to training the selected node, we’ll also be training thosenodes within the neighbourhood!

◮ Of course, it’s worth pointing out that the selected node will normallybe trained more than those that were simply close

The net result is exemplars that reflect their members very well, andadjacent nodes that also bear some resemblance

◮ i.e. similar in input space → similar in clustering


Disclaimer

This is where we could discuss the biological reasoning behind thisapproach, and point out the similarities with how the human brain tendsto associate information, etc.

But personally I think that’s immaterial

The real reason we’ll be using this is because it works


Self-Organizing Feature MapsKohonen Networks

Teuvo Kohonen came up with this nifty idea. It’s really just an extensionof what we’ve already seen.

We can have our nodes arranged in a line, 2D grid, or any otherdimensionality, but 2D’s pretty common

We’ll need to define a way of deciding the size of a neighbourhood◮ Since we’ll want it to finetune eventually, the radius of that

neighbourhood will shrink over time, until it includes only the selectednode

For the same reason, we’ll apply a learning rate, but also reduce thatover time

Of course, since our formulae will have a built-in mechanism forremoving training (after some number of epochs), it’s mathematicallycertain to eventually converge


SOMs — The MathWeight Update Rule:

~w t+1i

= ~w t

i + φ(t + 1)η(t + 1)(~v − ~w t

i )

Neighbourhood Multiplier:

φ(t) = e−

dist2

2σ2(t) where dist is the distance from the node to the BMU

Neighbourhood Size:σ(t) = σ0e

−

t

λ

Learning Rate:η(t) = η0e

−

t

λ

Distance between nodes:

dist =

√

√

√

√

i=n∑

i=0

(Vi −Wi )2


Applications

Of course, SOMs can be used for anything requiring clustering. Though,because they’re so easy to implement and effective, they’ve been appliedto a wide variety of problems.

They can often be used as a sort of ‘pre-learning’ tool◮ Funnel the data through a Kohonen network, and then apply a different

learning algorithm onto that data!

Reduction of dimensionality◮ Related to the previous point, SOMs can be a good way to transform

data⋆ Sometimes important features may be lost, but sometimes it simplifies

the problem

You might want to take a glance here:http://www.ai-junkie.com/ann/som/som5.html


http://www.ai-junkie.com/ann/som/som5.html

See Also...

You might also be interested in researching neural gas, a related topic.

Heck, it might even make a good seminar topic!◮ Hint

⋆ (hint)

◮ Then again, Learning Vector Quantization (LVQ) might as well...⋆ (hint hint)


Additional Reading

Of course, the (Rojas) book

AI Junkie: http://www.ai-junkie.com/ann/som/som1.html

https://www.cs.hmc.edu/~kpang/nn/som.html

https://www.hindawi.com/journals/cin/2017/4263064/

http://www.mperfect.net/aisompic/

◮ Note: gets a bit... creepy towards the latter half


http://www.ai-junkie.com/ann/som/som1.html

https://www.cs.hmc.edu/~kpang/nn/som.html

https://www.hindawi.com/journals/cin/2017/4263064/

http://www.mperfect.net/aisompic/

Questions?Comments?

Funny anecdotes?


self-organizing feature maps week 7cosc.brocku.ca/~efoxwell/4p80/week7.pdf ·...

Documents