virginia de sa desa at cogsci - ucsd cognitive sciencedesa/oldpublic_html/118b... · hebbian...
TRANSCRIPT
2Unsupervised Learning
No target data required
Extract structure (density estimates, cluster memberships, or produce a reduceddimensional representation) from the data
3Unsupervised algorithms are often forms of Hebbian Learning
Hebbian learning refers to modifying the strength of a connection according to afunction of the input and output activity (often simply the product).
It is based on a rule specified by the Canadian Donald Hebb in his 1949 book“The Organization of Behavior”
When an axon of cell A is near enough to excite a cell B and repeatedlyor persistently takes part in firing it, some growth process or metabolicchange takes place in one or both cells such that A’s efficiency, asone of the cells firing B, is increased (Hebb 1949)(figure below fromhttp://www.qub.ac.uk/mgt/intsys/nnbiol.html)
4Data Compression
We might want to compress data from high-dimensional spaces for several reasons:
• to enable us (and also machine learning algorithms) to better see relationships
• for more efficient storage and transmission of information (gzip, jpg)
We want to do this while preserving as much as the useful information as possible.(Of course how useful is determined is critical).
Clustering and PCA are different methods of dimensionality reduction.
5PCA and Clustering
PCA represents a point using a fewer number of dimensions. The directions arethe directions of greatest variance in the data
Clustering represents a point using prototype points.
6K-means
a simple but effective clustering algorithm
partitions the data in to K disjoint sets (clusters)
iterative batch algorithm
• Start with initial guess of k centers
• S(j) is all points closest to µ(j)
• Update
µ(j) = 1/Nj
∑n∈S(j)
x(n)
• until no change in the means
17K-means
a simple but effective clustering algorithm
partitions the data in to K disjoint sets (clusters)
iterative batch algorithm
• Start with initial guess of k centers
• S(j) is all points closest to µ(j)
• Update
µ(j) = 1/Nj
∑n∈S(j)
x(n)
• until no change in the means
18Stochastic K-means = Competitive Learning
Find weight w(j) that minimizes ||w(j) − x(n)|| (weight closest to the pattern)
and move it closer to the pattern
∆w(j) = η(t)(x(n) −w(j))
decrease learning rate with time
x1 x2 x3 x4
W
19Competitive Learning
����� ������ ���
���� ���
��� ��������� �"!#�"$"! %'&#& (")+*#,"-�*#./("021"3#45-76"893#("6 :7& ;":7*#*#-7<=6"8',":?>@-/A"-7-765:@B"C74D-76 *#-715:76"1E6"("<245:7#F@-71G:76"1H,"-76"I7-J#-K("6H:K*#./("021"3#4D-@6"893#("6":7&L89;","-7<2-J3#6G*#,"<2-@-K1"3#45-76"893#("6 89MON73#P7-7.�3#89-7QO*#, -.�-73#C7,"*#8R(")S*#,"-T*#,"<2-@-TI7&#B"89*#-@<�I7-76"*#-@<28R,":�>�-TA"-7-@6U6"("<24D:@#F7-71"MWVX, -T<2-71UI7B <2>@-78R8Y,"( .Z*#,"-*#<[:]\^-7I7*#( <2_`(")�*#,"-`.�-73#C7,"*�>@-7I7*#( <289Q�./,"3#I�,a8Y*#:7<2*�:7*�*#,"-R<2-71a; ("3#6"*#8W:76"1b-76"1a:@*�*#,"-RI7-76"*#-@<c(")�:I7&dB"89*#-7< MXe?<=("4Df�gh3#I�,":7<21RiEMXj�B"1":@Q�k7-7*#-7<�l7MXm�:7<=*#Q�:76 1`j�:�>@3#1Rn�MXo7*#("<2P7Q prq7s#s#t7u2vRwSx#q7yYy9z#{"|7q7s#z#} v�~�S� � �@�2�#�7�"�`����"�"�"�D�+�L���"� �D�G�#�#�7�����7�"�"�9�"�d�"�7
37Kohonen Feature Mapping
Update the neighbours (in output topography) as well as the winner. If y∗ refersto the winning output neuron then we update weights
∆w(k) = η(t)Λ(|y(k) − y∗|, t)(x−w(k))
window function decreases with time
38Kohonen Feature Mapping
������
���
�� ��� ��������� �
���
�����! #"!$�%'&)(!*+$��!(!,�-*+�!.!/102&3*54�,!02&
��,!/16!&2�7*54�,�0)&8�9
:�;1<)=#>@?'A�B!<CEDGF�A!;1B�<)FIHJ:�A!K)<
L�M NPORQTSVUXWXY�ZX[XY \^]+_a`�b�c#dXe#f�gahXi�j�i�hXfVklg�mnb�e#dXkpo�qX_Vr�o�stdXcEuXi�kl_�hX]vi�dXhXg�`�wxuXi�]+yV]+dXzXeE{a_|]+mXga{�_o�d}o~qX_�r�dXh7_�c#uXi�kl_ahX]+i�dXhXg�`~w�`�i�hX_�d7b�o�qX_Po�g�e#f�_�o�]+mXga{�_�{�gah}�X_�`�_ag�e#hX_�u}g�]�b�dX`�`�d7sR]v���)dXe3_ag�{�qmXd7i�hXo��l�������X� ���a�#�����������X�����~�X���#� ���a���+����� ���X�E�#���+�X�X�X�7���X� �X�X���X���~�����X� �+�7�X�E��� �+�X�a��� ���X���~������+�a�X�+���X���t�X�7���V�����a�V�����P�7 �¡�¢X£¥¤l¦X§+¨�©�ª�¨�¡ «a a¬�)¦X®�ª�¯�©�®E¡�¨�°²±�¨~³X �¢X±�´R ¥ª�©�¢V¯~¡�¢Xµ¥¨�³X �§+ �§3¶X¦X¡�¢X¨�§¡�¢|¨�³X �§v¦X·X®Eª� a¸�¡�¨�¡�§'©�§¹¡�º»¨�³X ¼¡�¤l©�£� �¯~¡�¢X ¼¡�§¹¶X¯�©�ª� a½�¡�¢�¨~³X �§+¦X·7®Eª� ¼§+¶X©�ª� �¬»¾¿ ¼ª�©�¯�¯�¨�³7¡�§¹¨�³7 ¶X®E �À#¡�¤l©a£� �¦XºI¨�³X ¹¨�©�®#£a �¨I§+¶X©�ª� a¬IÁx¨I¨�³7 �§+¨�©�¨~ �§+³X¦7´t¢X±I¨�³X ¹¶X©�®#¨�¡�ªa·X¯�©�®�§+ a¢X§+ �½ ¶X¦X¡�¢X¨I¯~ �©�½X§Â¨�¦lÃTÄÅXÆaÇ�È�ÉPÊlËXÌ+Í�Î�Ï�Í~È~Ð�Æ�Ñ�Ò!ÓXÆÕÔ�Æ�Î�Ö#ÉXÈ�É7Ç¥ÖE×XÔ�ÆÕØ�Ù�ÚXÑ�ÛXÛXÜ7Ý�ÊlÎ�Þ�Æ�ÌßÈ�Í�ÌßÌ+ËX×7ÖEÏ�ÆÕàXËXÈ�É7Í�ÊlË7ÐaÆÕÍ�Ë7á�ÎaÖ#â�Í~ÓXÆÌ+ÆaÉXÌ+Æ�â àXËXÈ�É7Í�ãTÎ�Ì�ÌvÓXË7áRÉ ÅIä¹Í~ÓXƹÌ+ÊlÎ�Ô�ÔTÎaÖ#Ö#Ë7áßÑTå�ÆaÏ�Î�×XÌ+ƹËXæTÍ�Ó7ƹáRÈ�É7âXË7áçæ�×XÉXÏaÍ�È�ËXÉéè�êìë íTî!ï}ð�ñ òvóô�õ7öR÷XøEö�ù#ú�ûlüaý�öRþXÿ7÷XþXú��Xô���ü����ìü��aö��Xô7ô�þ��� ����� �������������������� �! ��"����#$�%�&��'$�&�(��)$��*�'$��+���#$�$,$-(#$�'$���. (�/��,$0�#$132�43�&,$0�#5���� ��"'$*�'�-6*��/�"�()$�� ������5�� �' 7��*������8 ��8��#$�6 ��"�9�: �'$�$�$����76�&��'$�&���8��#���!#$�����;�&�$,$�:0��;�&)$ �0���+< ;����)$�$���$-�*�0( �����7;0��$���"��0��<�� �)=*��!���� (�"'$���$1<>?�"�$��@<AB*�0�#� ��"�=CD1<E�,��$ �+F �(�����8G�1IH� ��"��+� �'$�JE� ���*��JK�1IL����$�"M�+ON�P(Q�Q�R�S"TJUWV�P�X�X&Y�Z$[�P�Q�Y�\�T�]I^W_$` a�b"c�d�e$f=ghji$k$k$lnm�oDprq$s$tuwv�x�y�o�z|{�q$t�}&~$��t$���
39Kohonen Feature Mapping
����
���
��
��
��� ���������������� �� !#"%$�&('%)%*,+-&�.�/�01+32�4�.�'%5�&�0�.�672(0�8�6:9;*�2�<=0�8=>%);.�&�?%&�.�>A@B);$�672�0�8C5�)%8=>%9%5,6:$�)%'%9%6&�.D0�.�9E/�&�@B9%.�6F&�0�.DG(*�9%2�5�HI)%.�/J5�+�0J/�&�@B9%.�6:&�0�.�6KG�8=&�>%L�5�H�MIN�.J9%)%'#LD'%);6:9%OI5�L�9E+-9%&�>%L�5�6K)%5I5�L�9@B)%P%&(@B)%*�*�"7)%'%5�& Q#974�.�&�5�OSRUTWV�X�Y[Z�\�]7Z�^;_=`%]%Zba:c�^%d;]7`%]%ZbZ�\�]7e�^;_=`%]%a:Zbf�]%X�`%\�Zbg�c�h�^%Z�]7fC\�X�e�]ig�Y�X�Z�ajBk _=]lh�X�a:Z�^%Y�Zm`%]%Zna j ^%e�e�]%_og�c�h�^%Z�]%pmqr_ k�jBsnt X�d#\�^;_=hvuipmw-g�h�^%VmxU]%Z�]%_oy;pnz-^%_=Z(Vn^%Y�h{w-^}|%X�hv~-p� Z k _=�;V��}�%�����;�=�B�����%�F�:�����%�%���������������1�%�=���%���i����������7�1 I¡£¢�¤�¥B¦J§�¨�©% -ª¬«%¢�¥�:®�¯�¥�°%±
40Kohonen Feature Mapping
� � � � ���
����� ����� � � � ����� � ��� �����
� ����� � � � �����
� ��� � ����� � � � � �����
� � ����������������� ���������! #"!$�%�&�')(�$! +*���,��!-�-�$! #%.,� #$!/0$1%�-��!-�2�*�%�/3�! #$�')�!4�$�&�/�2�%�"�-�5�$�/0$!-�&�,.*���672�"!8�9�:�8�;�<�=���-�*�,�*���*�"!>2�?!�!����@�*� #4�$! A$!4 'B�!,C4�$!D1$1��*�,�/08EF5�$G%�&�'B(�$! H*��,��!-I-�$! #% ,� A$!/0$!%�-��!-I2�*�%�/J2�/K��2�/0-�$!4�8L6� A*�'BMNO2�?75��! #4 P�8QR&�4��1=SF$!-�$1 HT!8UR�! #-�=��!%�4BQR��D!2�4)VR8�W!-�*� AX!=Y�Z![�[I\!]#^B_a`�Z1b0b0c�d�e!Z![Ic�f�^�g�hai�jlk!m#n�o1p�qGrsut�v�v�w)xlyKz|{�}�~)�������!yR���1{�~��0����~��!�
41Kohonen Feature Mapping
����� ������� ����������� ��� ������� � ���������
� ������� ������������� � � ��������� � ����������� �����������
� � ������������������� ����� ��!�"$#�%$& ' (�)�*+)�(�&,!�� '+-�.�%$��/0' 12!�%$#�/3'���4�.�' %$����#�.�%65 �,�71�' 5 ��-�#8'���4�.�' %$��9�& %$)�:�;<-�' %$&+� -���1�' 5 � =<�>�?)�(@A)�& =<B�C�=�D�E�F<� ' 5AG2& %$)�:21�#�)�(�-<#�!<-�G���-�' %$& � -<��1�' 5+��)��H�7G�#�I�(2'+-�#�12-�G���1�#�)�(�-<)�(2-�G�����#�.�%J5 ����1�' 5+��-�G�' -<��� ' :��H/K' L )�/K' ����M� L+5 )�-�� �N-�G�' -O-�' %J& � -O1�#�)�(�-�=O@P%$#�/KQORS)�5AG�' %$:UT8=OV�.�:�' FXW<� -�� %ZY =O[N' %$-�FO'+(�:\V�'^] )�:U_�=X` -�#�%$a+Fcb^d e�e�f+g$h\i>j�d+k�k�l�m�n d+e�l�o�h�pq>r�sut+v$w�x y�z8{|~}������0�u�c�������0������� �N��� ������������� �
42Kohonen Feature Mapping
� ������� ��������� �����������
�� ��� �������������� �������������� ���!�"�#�$%!���&'���)(+*,�-��.�/� �01!��'&2 �/���3'!�$4 ���5�6�"�!-$708��9�6����'5�����:+3'!� � ���$4��0;#�$%!���&'���)"�<�5=/���08�-��(+"���!�& ��?>����'>�0@���A �/��B��!�3�CD��E=���?��F- �����08� E-�B:G6�$4 �/���$H �$%!��������'.B&����-0@���� ���"����)���'!� ��B �/��I>�����>�J�K��?086'5-/A5�!�08��0MLD"���!�$4�����'.B08/��'6�"�&N ��$4��08 �!�$4 G��&O*,�� �/P$%!-��&����)�GQ���&O*�����.�/� �0R!���&P3���0808� N "�<�!�*,��&��-$S*,���'&��'*T:�6���5� ����'�P!���&P08"��'*,��$U&���5�!�<����P"���!�$4�����'.�JV�$%���)WHX;�G5-/'!�$4&ZYAJH[,6'&�!�LH\+�- ���$^]�J7_,!�$4 �L7!���&`[,!�E���&`a,JH�� G��$4>�L;b�c-d�d�e�f4g`hSi�c�j8j8k�l'm�c�d�k�n�g�o7pSq�rDs�t4u�v�w'x�yz|{�}�}�~`�'������������������,���������8�������-�
43More examples
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/JavaPaper/node24.html
44Some SOM applets
applet from rfhs8012.fh-regensburg.de/ saj39122/jfroehl/diplom/e-sample.html
applet from www.patol.com/java/fill/index.html
applet from www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html
45Let’s look at visual cortex example
Obermayer1990.pdf http://www.pnas.org/cgi/reprint/87/21/8345
47Aside to related supervised algorithms (Kohonen’s LearningVector Quantization)
Supervised methods for moving cluster centers (makes use of given class label)
Can have more than one center per class.
Move centers to reduce the number of misclassified patterns.
Various flavours. LVQ2.1 minimizes number of misclassified patterns
48LVQ2.1 Learning rule
Let w(i), and w(j) be the closest codebook vectors
Only if exactly one of w(i) and w(j) belongs to the correct class andmin(||x−w(i)||/||x−w(j)||, ||x−w(j)||/||x−w(i)||) < s (x lies within a windowof the border region) do the following (the below rules assume w(i) is from thecorrect class, switch the rules if not)
•
w(i) = w(i) + ε(x−w(i))
•
w(j) = w(j) − ε(x−w(j))
49Improved LVQ2.1 Learning rule
Let w(i), and w(j) be the closest codebook vectors
Only if exactly one of w(i) and w(j) belongs to the correct class andmin(||x−w(i)||/||x−w(j)||, ||x−w(j)||/||x−w(i)||) < s(t) (x lies within awindow of the border region that decreases with time) do we apply the following(the below rules assume w(i) is from the correct class, switch the rules if not)
•
w(i) = w(i) + ε(x−w(i))||x−w(i)||
•
w(j) = w(j) − ε(x−w(j))||x−w(j)||
50LVQ2.1 in 2-D
•
w(i) = w(i) + ε(x−w(i))||x−w(i)||
•
w(j) = w(j) − ε(x−w(j))||x−w(j)||
w(i) is from the correct class, w(j) from an incorrect class
x
x
1
2
x x1 2
y y y y ya b c d e
51LVQ in 1-D
Force to the left <--
Force to the right -->
P(C )p(x|C )
P(C )p(x|C )
LVQ 2.0P
x
A
B
A
B
Class A decision Class B decision
P(C )p(x|C )
P(C )p(x|C )
LVQ 2.1P
x
A
B
A
B
Class A decision Class B decision
52LVQ in 1-D, Separable Distributions
Force to the left <--
Force to the right -->
P(C )p(x|C )BB
P(C )p(x|C )AAP(C )p(x|C )BB
P(C )p(x|C )AA
LVQ 2.0P
xClass A decision Class B decision
LVQ 2.1P
xClass A decision Class B decision
54Solution
Model the clusters as Gaussian’s and learn the covariance ellipses with the dataand use probabilities associated with the Gaussian density to determine
membership.
55Mixture of Gaussians (MOG) = A softer k-means
Model the data as coming from a mixture of Gaussian’s and you don’t know whichGaussian generated which data point
Each Gaussian cluster has an associated proportion or prior probability πk
p(x) =c∑
k=1
πkpk(x)
In the mixture of Gaussian’s case
pk(x) ∼ N(µ(k),Σk)
pk(x) =1
|2πΣk|.5e−(x−µ(k))T Σ−1
k(x−µ(k))
2
mixture models can be generalized
56MOG Solution
Normalize the probabilities to determine the responsibility of each cluster for eachdata point (soft-responsibility).
rk(x(n)) =πkpk(x(n))∑i πipi(x(n))
Now solve, similarly to k-means solution Recompute the mean, covariance andoverall weighting, for each cluster with each datapoint contributing weight
according to its responsibility. Then iterate as in k-means.
µ(k) =∑
n rk(x(n))x(n)∑n rk(x(n))
Σk =∑
n rk(x(n))(x(n) − µ(i))2
N∑
n rk(x(n))
πk =∑
n
rk(x(n))/∑
i
∑n
ri(x(n))
57Issues with MOG
Quite sensitive to initial conditions applet
it’s a good idea to initialize with k-means
There are a large number of parameters. We can reduce parameters by
57Issues with MOG
Quite sensitive to initial conditions applet
it’s a good idea to initialize with k-means
There are a large number of parameters. We can reduce parameters by
a) constraining Gaussians to have diagonal covariance matrices
b) constraining Gaussians to have the same covariance matrix
58
1
Aug 29, 2001Copyright © 2001, Andrew W. Moore
GaussiansAndrew W. Moore
Associate ProfessorSchool of Computer ScienceCarnegie Mellon University
www.cs.cmu.edu/[email protected]
412-268-7599
Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received.
Copyright © 2001, Andrew W. Moore Gaussians: Slide 2
Gaussians in Data Mining• Why we should care• The entropy of a PDF• Univariate Gaussians• Multivariate Gaussians• Bayes Rule and Gaussians• Maximum Likelihood and MAP using
Gaussians
59
21
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41
After first iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42
After 2nd iteration
60
21
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41
After first iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42
After 2nd iteration
61
22
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43
After 3rd iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44
After 4th iteration
62
22
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43
After 3rd iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44
After 4th iteration
63
23
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45
After 5th iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46
After 6th iteration
64
23
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45
After 5th iteration
Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46
After 6th iteration