adjoint orbits, principal components, and neural nets some facts about lie groups and examples...

Adjoint Orbits, Principal Components, and Neural Nets

2. Some facts about Lie groups and examples3. Examples of adjoint orbits and a distance measure4. Descent equations on adjoint orbits5. Properties of the double bracket equation6. Smoothed versions of the double bracket equation7. The principal component extractor8. The performance of subspace filters9. Variations on a theme

Where We Are

9:30 - 10:45 Part 1. Examples and Mathematical Background10:45 - 11:15 Coffee break

11:15- 12:30 Part 2. Principal components, Neural Nets, and Automata

12:30 - 14:30 Lunch14:30 - 15:45 Part 3. Precise and Approximate Representation

of Numbers15:45 - 16:15 Coffee break16:15 - 17:30 Part 4. Quantum Computation

The Adjoint Orbit Theory and Some Applications

1. Some facts about Lie groups and examples2. Examples of adjoint orbits and a distance measure3. Descent equations on adjoint orbits4. Properties of the double bracket equation5. Smoothed versions of the double bracket equation6. Loops and deck transformations

Some BackgroundByaLieGroupGweunderstandagroupwithatopologysuchthatmultiplicationandinver-sionarecontinuous.(InthissettingcontinuousimpliesdiÆerentiable.)WesaythatagroupactsonadiÆerentiablemanifoldXvia¡if¡:G£X!MisdiÆeren-tiableand¡(G2G1;x)=¡(G2;¡(G1;x)).ThegroupoforthogonalmatricesSo(n)actsonthen°1-dimensionalsphereviatheaction¡(£;x)=£x

More Mathematics BackgroundAssociatedwitheveryLiegroupisaLiealge-braLwhichmaybethoughtofasdescribinghowGlooksinasmallsetaroundtheidentity.Abstractly,aLiealgebraisavectorspacewithabilinearmapping¡:L£L7!Lsuchthat[L1;L2]=°[L2;L1][L1;[L2;L3]]+[L2;[L3;L1]]+[L3;[L1;L2]]=0TheLiealgebraassociatedwiththerealorthog-onalgroupisthesetofskew-symmetricmatri-cesofthesamedimension.Thebilinearopera-tionisgivenby[≠1;≠2]=≠1≠2°≠2≠1.

A Little More Mathematics BackgroundLet£beanorthogonalmatrixandletQbeasymmetricmatrixwitheigenvalues∏1;∏2;:::;∏n.Theformula£TQ£deØnesagroupactiononSym(∏1;∏2;:::;∏n).Thesetoforthogonalma-tricesisofdimensionn(n°1)=2andthespaceSym(§)isofdimensionn(n+1)=2.ThisactionisbasictoalotMatlab!Theactionofthegroupofunitarymatricesonthespaceofskew-hermitianmatricesvia(U;H)7!UyHUcanbethoughtofasgeneralizingthisac-tion.ItisanexampleofagroupactingonitsownLiealgebra.Thisisanadjointaction.

Still More Mathematics BackgroundConsiderLiealgebraswhoseelementsarenbynmatricesandLiegroupswhoseelementsarenonsingularnbynmatrices.Themappingexp:L7!eLsendstheLiealgebraintothegroupofinvertiblematrices.TheidentityP°1eLP=eP°1LPdeØnestheadjointaction.If¡:G£X!XisagroupactionthenthereisanequivalencerelationonXdeØnedbyxºyify=¡(G1;x)forsomeG12G.Setsofequivalentpointsarecalledorbits.ThesubsetofHΩGsuchthat¡(H;x0)=x0formsasubgroupcalledtheisotropygrouptx0.

The Last for now, Mathematics BackgroundAnyL12LdeØnesvia[L1;¢]:L!L,alineartransformationonaØnitedimensionalspace.ItisoftenwrittenadL1(¢).adL1(adL2(¢))=[L1;[L2;¢]]deØnesalineartransformationonLaswell.ThesumoftheeigenvaluesofthismapdeØneswhatiscalledtheKillingform∑(L1;L2),onL.Forsemisimplecompactgroupssuchastheorthogonalorspecialunitarygroup,theKillingformisnegativedeØniteandpropor-tionaltothemorefamiliartr(≠1≠2).TheKillingformonGdeØnesametricontheadjointorbitcalledthenormalmetric.

Getting a Feel for the Normal MetricExplanation:Considerperturbing£via£7!£(I+≠).Linearizingtheequation£TQ£=HwegetH≠+≠TH=[H;≠]=dHThus ≠=ad°1H(dH)IfHisdiagonalthen!ij=dhij∏i°∏j

Steepest Descent on an Adjoint Orbit

LetQ=QTandN=NTbesymmetricma-tricesandlet£beorthogonal.Considerthefunctiontr£TQ£Nthoughtofasafunctionontheorthogonalmatrices.RelativetotheKillingmetricontheorthogonalgroup,thegradientdescent∞owforminimizingthisfunctionis_£=[£TQ£;N]£Ifwelet£TQ£=HthenthederivativeofHcanbeexpressedas_H=[H;[H;N]]

QuickTime™ and aIntel Indeo® Video R3.2 decompressor

are needed to see this picture.

A Descent Equation on an Adjoint OrbitLetQ=QTandN=NTbesymmetricmatri-cesandlet√(H)bearealvaluedfunctiononSym(§).Whatisthegradientof√(H)?ThegradientonaRiemannianspaceisG°1d√.OnSym(§)theinverseoftheRiemannianmetricisgivenby[H;[H;¢]].andsothedescentequationis _H=°[H;[H;d√(H)]]Thusfor√(H)=tr(HN)wehave_H=°[H;[H;N]].IfNisdiagonalthentrHNachievesitsmini-mumwhenHisdiagonalandsimilarlyorderedwith°N.

A Descent Equation with Multiple Equilibria

If√(H)=°tr(diag(H)H)then_H=[H;[H;2diag(H)]]LetQ=QTandN=NTbediagonalmatriceswithdistincteigenvalues.Thedescentequationis_H=°[H;[H;d√(H)]]Thusfor√(H)=tr(HN)wehave_H=[H;[H;N]]If√(H)=diagHthen_H=2[H;[H;diag(H)]]

A Descent Equation with Smoothing AddedConsiderreplacingthesystem_H=[H;[H;N]]with_H=[H;q(D)P];p(D)P=[H;N]HereD=d=dt.Thissmoothsthesignalsbutdoesnotaltertheequilibriumpoints.StabilityisunaÆectedifq=pisapositiverealfunction.

QuickTime™ and aIntel Indeo® Video R3.2 decompressor

are needed to see this picture.

The Double Bracket Flow for Analog ComputationPrincipalComponentsinRnLearningwithoutateacherissometimesapproachedbyØndingprincipalcomponents._W=x(t)xT(t)-forgettingterm£T(t)W(t)£(t)=diag(∏1:::;∏n)Columnsof£are\components'Theprincipalcomponentsareassembledinahiddenlayer

xxxx

yyyy

ww

wvvvv

1 112 2 23 3 3n n

mmm

Adaptive Subspace Filtering

Filter

t

frequency

power

Let u be a vector of inputs, and let be a diagonal “editing” matrix that selects energy levels that are desirable. An adaptive subspace filter with input u and output y can be realized by implementing the equations

Some Equations

dQdt

=uuT −(1−tr(Q))Q

dΘdt

=[ΘQΘT,N]Θ

y=ΘΘTu

Neural Nets as Flows on Grassmann Manifolds

DenotebyG(n;k)thespaceofk-planesinn-space.ThisspaceisadiÆerentiablemanifoldthatcanbeparameterizedbythesetofallkbynmatricesofrankk.Itisamanifold.AdaptivesubspaceØlterssteertheweightssoastodeØneaparticularelementofthisspace.Thus§£,deØnessuchapointif§lookslike§=266410:::001:::0::::::::::::00:::1

3775

Summary of Part 2

1. We have given some mathematical background necessary to work with flows on adjoint orbits and indicated some applications.

2. We have defined flows that will stabilize at invariant subspaces corresponding to the principal components of a vector process. These flows can be interpreted as flows that learn without a teacher.

3. We have argued that in spite of its limitations, steepest descent is usually the first choice in algorithm design.

4. We have interpreted a basic neural network algorithm as a flow in a Grassmann manifold generated by a steepest descent tracking algorithm.

M. W. Berry et al., “Matrices, Vector Spaces, and Information Retrieval” SIAM Review, vol. 41, No. 2, 1999.

R. W. Brockett, “Dynamical Systems That Learn Subspaces” in Mathematical System Theory: The Influence of R. E. Kalman, (A.C. Antoulas, ed.) Springer -Verlag, Berlin. 1991. pp. 579--592.

R. W. Brockett “An Estimation Theoretic Basis for the Design of Sorting and Classification Networks,” in Neural Networks, (R. Mammone and Y. Zeevi, eds.) Academic Press, 1991, pp. 23-41.

A Few References

adjoint orbits, principal components, and neural nets some facts about lie groups and examples...

Documents

theme slide

adjoint orbit slide

mathematics background

analog computation slide

quantum computation

smoothing added slide

normal metric slide

examples of adjoint