adjoint orbits, principal components, and neural nets some facts about lie groups and examples...
TRANSCRIPT
Adjoint Orbits, Principal Components, and Neural Nets
2. Some facts about Lie groups and examples3. Examples of adjoint orbits and a distance measure4. Descent equations on adjoint orbits5. Properties of the double bracket equation6. Smoothed versions of the double bracket equation7. The principal component extractor8. The performance of subspace filters9. Variations on a theme
Where We Are
9:30 - 10:45 Part 1. Examples and Mathematical Background10:45 - 11:15 Coffee break
11:15- 12:30 Part 2. Principal components, Neural Nets, and Automata
12:30 - 14:30 Lunch14:30 - 15:45 Part 3. Precise and Approximate Representation
of Numbers15:45 - 16:15 Coffee break16:15 - 17:30 Part 4. Quantum Computation
The Adjoint Orbit Theory and Some Applications
1. Some facts about Lie groups and examples2. Examples of adjoint orbits and a distance measure3. Descent equations on adjoint orbits4. Properties of the double bracket equation5. Smoothed versions of the double bracket equation6. Loops and deck transformations
Some BackgroundByaLieGroupGweunderstandagroupwithatopologysuchthatmultiplicationandinver-sionarecontinuous.(InthissettingcontinuousimpliesdiÆerentiable.)WesaythatagroupactsonadiÆerentiablemanifoldXvia¡if¡:G£X!MisdiÆeren-tiableand¡(G2G1;x)=¡(G2;¡(G1;x)).ThegroupoforthogonalmatricesSo(n)actsonthen°1-dimensionalsphereviatheaction¡(£;x)=£x
More Mathematics BackgroundAssociatedwitheveryLiegroupisaLiealge-braLwhichmaybethoughtofasdescribinghowGlooksinasmallsetaroundtheidentity.Abstractly,aLiealgebraisavectorspacewithabilinearmapping¡:L£L7!Lsuchthat[L1;L2]=°[L2;L1][L1;[L2;L3]]+[L2;[L3;L1]]+[L3;[L1;L2]]=0TheLiealgebraassociatedwiththerealorthog-onalgroupisthesetofskew-symmetricmatri-cesofthesamedimension.Thebilinearopera-tionisgivenby[≠1;≠2]=≠1≠2°≠2≠1.
A Little More Mathematics BackgroundLet£beanorthogonalmatrixandletQbeasymmetricmatrixwitheigenvalues∏1;∏2;:::;∏n.Theformula£TQ£deØnesagroupactiononSym(∏1;∏2;:::;∏n).Thesetoforthogonalma-tricesisofdimensionn(n°1)=2andthespaceSym(§)isofdimensionn(n+1)=2.ThisactionisbasictoalotMatlab!Theactionofthegroupofunitarymatricesonthespaceofskew-hermitianmatricesvia(U;H)7!UyHUcanbethoughtofasgeneralizingthisac-tion.ItisanexampleofagroupactingonitsownLiealgebra.Thisisanadjointaction.
Still More Mathematics BackgroundConsiderLiealgebraswhoseelementsarenbynmatricesandLiegroupswhoseelementsarenonsingularnbynmatrices.Themappingexp:L7!eLsendstheLiealgebraintothegroupofinvertiblematrices.TheidentityP°1eLP=eP°1LPdeØnestheadjointaction.If¡:G£X!XisagroupactionthenthereisanequivalencerelationonXdeØnedbyxºyify=¡(G1;x)forsomeG12G.Setsofequivalentpointsarecalledorbits.ThesubsetofHΩGsuchthat¡(H;x0)=x0formsasubgroupcalledtheisotropygrouptx0.
The Last for now, Mathematics BackgroundAnyL12LdeØnesvia[L1;¢]:L!L,alineartransformationonaØnitedimensionalspace.ItisoftenwrittenadL1(¢).adL1(adL2(¢))=[L1;[L2;¢]]deØnesalineartransformationonLaswell.ThesumoftheeigenvaluesofthismapdeØneswhatiscalledtheKillingform∑(L1;L2),onL.Forsemisimplecompactgroupssuchastheorthogonalorspecialunitarygroup,theKillingformisnegativedeØniteandpropor-tionaltothemorefamiliartr(≠1≠2).TheKillingformonGdeØnesametricontheadjointorbitcalledthenormalmetric.
Getting a Feel for the Normal MetricExplanation:Considerperturbing£via£7!£(I+≠).Linearizingtheequation£TQ£=HwegetH≠+≠TH=[H;≠]=dHThus ≠=ad°1H(dH)IfHisdiagonalthen!ij=dhij∏i°∏j
Steepest Descent on an Adjoint Orbit
LetQ=QTandN=NTbesymmetricma-tricesandlet£beorthogonal.Considerthefunctiontr£TQ£Nthoughtofasafunctionontheorthogonalmatrices.RelativetotheKillingmetricontheorthogonalgroup,thegradientdescent∞owforminimizingthisfunctionis_£=[£TQ£;N]£Ifwelet£TQ£=HthenthederivativeofHcanbeexpressedas_H=[H;[H;N]]
A Descent Equation on an Adjoint OrbitLetQ=QTandN=NTbesymmetricmatri-cesandlet√(H)bearealvaluedfunctiononSym(§).Whatisthegradientof√(H)?ThegradientonaRiemannianspaceisG°1d√.OnSym(§)theinverseoftheRiemannianmetricisgivenby[H;[H;¢]].andsothedescentequationis _H=°[H;[H;d√(H)]]Thusfor√(H)=tr(HN)wehave_H=°[H;[H;N]].IfNisdiagonalthentrHNachievesitsmini-mumwhenHisdiagonalandsimilarlyorderedwith°N.
A Descent Equation with Multiple Equilibria
If√(H)=°tr(diag(H)H)then_H=[H;[H;2diag(H)]]LetQ=QTandN=NTbediagonalmatriceswithdistincteigenvalues.Thedescentequationis_H=°[H;[H;d√(H)]]Thusfor√(H)=tr(HN)wehave_H=[H;[H;N]]If√(H)=diagHthen_H=2[H;[H;diag(H)]]
A Descent Equation with Smoothing AddedConsiderreplacingthesystem_H=[H;[H;N]]with_H=[H;q(D)P];p(D)P=[H;N]HereD=d=dt.Thissmoothsthesignalsbutdoesnotaltertheequilibriumpoints.StabilityisunaÆectedifq=pisapositiverealfunction.
The Double Bracket Flow for Analog ComputationPrincipalComponentsinRnLearningwithoutateacherissometimesapproachedbyØndingprincipalcomponents._W=x(t)xT(t)-forgettingterm£T(t)W(t)£(t)=diag(∏1:::;∏n)Columnsof£are\components'Theprincipalcomponentsareassembledinahiddenlayer
Let u be a vector of inputs, and let be a diagonal “editing” matrix that selects energy levels that are desirable. An adaptive subspace filter with input u and output y can be realized by implementing the equations
Some Equations
dQdt
=uuT −(1−tr(Q))Q
dΘdt
=[ΘQΘT,N]Θ
y=ΘΘTu
Neural Nets as Flows on Grassmann Manifolds
DenotebyG(n;k)thespaceofk-planesinn-space.ThisspaceisadiÆerentiablemanifoldthatcanbeparameterizedbythesetofallkbynmatricesofrankk.Itisamanifold.AdaptivesubspaceØlterssteertheweightssoastodeØneaparticularelementofthisspace.Thus§£,deØnessuchapointif§lookslike§=266410:::001:::0::::::::::::00:::1
3775
Summary of Part 2
1. We have given some mathematical background necessary to work with flows on adjoint orbits and indicated some applications.
2. We have defined flows that will stabilize at invariant subspaces corresponding to the principal components of a vector process. These flows can be interpreted as flows that learn without a teacher.
3. We have argued that in spite of its limitations, steepest descent is usually the first choice in algorithm design.
4. We have interpreted a basic neural network algorithm as a flow in a Grassmann manifold generated by a steepest descent tracking algorithm.
M. W. Berry et al., “Matrices, Vector Spaces, and Information Retrieval” SIAM Review, vol. 41, No. 2, 1999.
R. W. Brockett, “Dynamical Systems That Learn Subspaces” in Mathematical System Theory: The Influence of R. E. Kalman, (A.C. Antoulas, ed.) Springer -Verlag, Berlin. 1991. pp. 579--592.
R. W. Brockett “An Estimation Theoretic Basis for the Design of Sorting and Classification Networks,” in Neural Networks, (R. Mammone and Y. Zeevi, eds.) Academic Press, 1991, pp. 23-41.
A Few References