lecture: face recognition and feature reductioncs131.stanford.edu/files/12_svd_pca.pdf · stanford...

Lecture 11 -Stanford University

Lecture:FaceRecognitionandFeatureReduction

JuanCarlosNiebles andRanjayKrishnaStanfordVisionandLearningLab

2-Nov-171


Recap- Curseofdimensionality• Assume5000pointsuniformlydistributedintheunit

hypercubeandwewanttoapply5-NN.Supposeourquerypointisattheorigin.– In1-dimension,wemustgoadistanceof5/5000=0.001onthe

averagetocapture5nearestneighbors.– In2dimensions,wemustgotogetasquarethatcontains0.001

ofthevolume.– Inddimensions,wemustgo

31-Oct-172

0.001

0.001( )1/d


Whatwewilllearntoday• Singularvaluedecomposition• PrincipalComponentAnalysis(PCA)• Imagecompression

2-Nov-173


Whatwewilllearntoday• Singularvaluedecomposition• PrincipalComponentAnalysis(PCA)• Imagecompression

2-Nov-174


SingularValueDecomposition(SVD)

• Thereareseveralcomputeralgorithmsthatcan“factorize”amatrix,representingitastheproductofsomeothermatrices

• ThemostusefuloftheseistheSingularValueDecomposition.

• RepresentsanymatrixA asaproductofthreematrices:UΣVT

• Pythoncommand:– [U,S,V]= numpy.linalg.svd(A)

2-Nov-175



UΣVT =A• WhereU andV arerotationmatrices,andΣisascalingmatrix.Forexample:

2-Nov-176


SingularValueDecomposition(SVD)• Beyond2x2matrices:

– Ingeneral,ifA ism xn,thenU willbemxm, Σ willbem xn,andVT willben xn.

– (Notethedimensionsworkouttoproducem xnaftermultiplication)

2-Nov-177



• U andV arealwaysrotationmatrices.– Geometricrotationmaynotbeanapplicableconcept,dependingonthematrix.Sowecallthem“unitary”matrices– eachcolumnisaunitvector.

• Σisadiagonalmatrix– Thenumberofnonzeroentries=rankofA– Thealgorithmalwayssortstheentrieshightolow

2-Nov-178


SVDApplications

• We’vediscussedSVDintermsofgeometrictransformationmatrices

• ButSVDofanimagematrixcanalsobeveryuseful

• Tounderstandthis,we’lllookatalessgeometricinterpretationofwhatSVDisdoing

2-Nov-179


SVDApplications

• Lookathowthemultiplicationworksout,lefttoright:• Column1ofU getsscaledbythefirstvaluefromΣ.

• Theresultingvectorgetsscaledbyrow1ofVT toproduceacontributiontothecolumnsofA

2-Nov-1710


SVDApplications

• Eachproductof(columni ofU)·(valuei fromΣ)·(rowi ofVT)producesacomponentofthefinalA.

2-Nov-1711

+

=


SVDApplications

• We’rebuildingAasalinearcombinationofthecolumnsof U• UsingallcolumnsofU,we’llrebuildtheoriginalmatrixperfectly• But,inreal-worlddata,oftenwecanjustusethefirstfew

columnsofU andwe’llgetsomethingclose(e.g.thefirstApartial,above)

2-Nov-1712


SVDApplications

• Wecancallthosefirstfewcolumnsof UthePrincipalComponents ofthedata

• Theyshowthemajorpatternsthatcanbeaddedtoproducethecolumnsoftheoriginalmatrix

• TherowsofVT showhowtheprincipalcomponents aremixedtoproducethecolumnsofthematrix

2-Nov-1713


SVDApplications

WecanlookatΣtoseethatthefirstcolumnhasalargeeffect

2-Nov-1714

whilethesecondcolumnhasamuchsmallereffectinthisexample


SVDApplications

• Forthisimage,usingonlythefirst10 of300principalcomponentsproducesarecognizablereconstruction

• So,SVDcanbeusedforimagecompression

2-Nov-1715


SVDforsymmetricmatrices

• IfAisasymmetricmatrix,itcanbedecomposedasthefollowing:

• ComparedtoatraditionalSVDdecomposition,U=VT andisanorthogonalmatrix.

2-Nov-1716


PrincipalComponentAnalysis

• Remember,columnsof UarethePrincipalComponents ofthedata:themajorpatternsthatcanbeaddedtoproducethecolumnsoftheoriginalmatrix

• Oneuseofthisistoconstructamatrixwhereeachcolumnisaseparatedatasample

• RunSVDonthatmatrix,andlookatthefirstfewcolumnsofUtoseepatternsthatarecommonamongthecolumns

• ThisiscalledPrincipalComponentAnalysis (orPCA)ofthedatasamples

2-Nov-1717


PrincipalComponentAnalysis

• Often,rawdatasampleshavealotofredundancyandpatterns• PCAcanallowyoutorepresentdatasamplesasweightsonthe

principalcomponents,ratherthanusingtheoriginalrawformofthedata

• Byrepresentingeachsampleasjustthoseweights,youcanrepresentjustthe“meat”ofwhat’sdifferentbetweensamples.

• Thisminimalrepresentationmakesmachinelearningandotheralgorithmsmuchmoreefficient

2-Nov-1718


HowisSVDcomputed?

• Forthisclass:tellPYTHONtodoit.Usetheresult.

• But,ifyou’reinterested,onecomputeralgorithmtodoitmakesuseofEigenvectors!

2-Nov-1719


Eigenvectordefinition

• SupposewehaveasquarematrixA.Wecansolveforvectorxandscalarλ suchthatAx= λx

• Inotherwords,findvectorswhere,ifwetransformthemwithA,theonlyeffectistoscalethemwithnochangeindirection.

• Thesevectorsarecalledeigenvectors(Germanfor“selfvector”ofthematrix),andthescalingfactorsλ arecalledeigenvalues

• Anm xmmatrixwillhave≤m eigenvectorswhereλ isnonzero

2-Nov-1720


Findingeigenvectors• ComputerscanfindanxsuchthatAx= λxusingthisiterativealgorithm:

– X=randomunitvector– while(xhasn’tconverged)

• X=Ax• normalizex

• xwillquicklyconvergetoaneigenvector• Somesimplemodificationswillletthisalgorithmfindalleigenvectors

2-Nov-1721


FindingSVD

• Eigenvectorsareforsquarematrices,butSVDisforallmatrices

• Todosvd(A),computerscandothis:– TakeeigenvectorsofAAT(matrixisalwayssquare).

• TheseeigenvectorsarethecolumnsofU.• Squarerootofeigenvalues arethesingularvalues(theentriesofΣ).

– TakeeigenvectorsofATA(matrixisalwayssquare).• TheseeigenvectorsarecolumnsofV (orrowsofVT)

2-Nov-1722


FindingSVD

• Moralofthestory:SVDisfast,evenforlargematrices• It’susefulforalotofstuff• TherearealsootheralgorithmstocomputeSVDorpartof

theSVD– Python’snp.linalg.svd()commandhasoptionstoefficiently

computeonlywhatyouneed,ifperformancebecomesanissue

2-Nov-1723

AdetailedgeometricexplanationofSVDishere:http://www.ams.org/samplings/feature-column/fcarc-svd


Whatwewilllearntoday• Introductiontofacerecognition• PrincipalComponentAnalysis(PCA)• Imagecompression

2-Nov-1724


Covariance• VarianceandCovarianceareameasureofthe“spread”ofasetofpointsaroundtheircenterofmass(mean)

• Variance– measureofthedeviationfromthemeanforpointsinonedimensione.g.heights

• Covarianceasameasureofhowmucheachofthedimensionsvaryfromthemeanwithrespecttoeachother.

• Covarianceismeasuredbetween2dimensionstoseeifthereisarelationshipbetweenthe2dimensionse.g.numberofhoursstudied&marksobtained.

• Thecovariancebetweenonedimensionanditselfisthevariance

2-Nov-1725


Covariance

• So,ifyouhada3-dimensionaldataset(x,y,z),thenyoucouldmeasurethecovariancebetweenthexandydimensions,theyandzdimensions,andthexandzdimensions.Measuringthecovariancebetweenxandx,oryandy,orzandzwouldgiveyouthevarianceofthex,yandzdimensionsrespectively

2-Nov-1726


Covariancematrix• RepresentingCovariancebetweendimensionsasamatrixe.g.for3dimensions

• Diagonalisthevariancesofx,yandz• cov(x,y)=cov(y,x)hencematrixissymmetricalaboutthediagonal

• N-dimensionaldatawillresultinNxN covariancematrix

2-Nov-1727


Covariance

• Whatistheinterpretationofcovariancecalculations?– e.g.:2dimensionaldataset– x:numberofhoursstudiedforasubject– y:marksobtainedinthatsubject– covariancevalueissay:104.53– whatdoesthisvaluemean?

2-Nov-1728


Covarianceinterpretation

2-Nov-1729


Covarianceinterpretation• Exactvalueisnotasimportantasit’ssign.• Apositivevalueofcovarianceindicatesbothdimensionsincreaseordecreasetogethere.g.asthenumberofhoursstudiedincreases,themarksinthatsubjectincrease.

• Anegativevalueindicateswhileoneincreasestheotherdecreases,orvice-versae.g.activesociallifeatPSUvsperformanceinCSdept.

• Ifcovarianceiszero:thetwodimensionsareindependentofeachothere.g.heightsofstudentsvsthemarksobtainedinasubject

2-Nov-1730


Exampledata

2-Nov-1731

Covariancebetweenthetwoaxisishigh.Canwereducethenumberofdimensionstojust1?


GeometricinterpretationofPCA

2-Nov-1732


GeometricinterpretationofPCA

• Let’ssaywehaveasetof2Ddatapointsx.Butweseethatallthepointslieonalinein2D.

• So,2dimensionsareredundanttoexpressthedata.Wecanexpressallthepointswithjustonedimension.

2-Nov-1733

1Dsubspacein2D


PCA:PrincipleComponentAnalysis

• Givenasetofpoints,howdoweknowiftheycanbecompressedlikeinthepreviousexample?– Theansweristolookintothecorrelationbetweenthepoints

– ThetoolfordoingthisiscalledPCA

2-Nov-1734


PCAFormulation• Basicidea:

– Ifthedatalivesinasubspace,itisgoingtolookveryflatwhenviewedfromthefullspace,e.g.

2-Nov-1735

SlideinspiredbyN.Vasconcelos

1Dsubspacein2D 2Dsubspacein3D


PCAFormulation• AssumexisGaussianwith

covarianceΣ.

• Recallthatagaussian isdefinedwithit’smeanandvariance:

• Recallthat μ andΣ ofagaussian aredefinedas:

2-Nov-1736

x1

x2

λ1λ2

φ1

φ2


PCAformulation

• Sincegaussians aresymmetric,it’scovariancematrixisalsoasymmetricmatrix.Sowecanexpressitas:– Σ = UΛUT = UΛ1/2(UΛ1/2)T

2-Nov-1737


PCAFormulation• IfxisGaussianwithcovarianceΣ,

– Principalcomponentsφi aretheeigenvectorsofΣ– Principallengthsλi aretheeigenvaluesofΣ

• bycomputingtheeigenvaluesweknowthedatais– Notflatifλ1 ≈λ2– Flatifλ1 >>λ2

2-Nov-1738


x1

x2

λ1λ2

φ1

φ2


PCAAlgorithm(training)

2-Nov-1739



PCAAlgorithm(testing)

2-Nov-1740



PCAbySVD• Analternativemannertocomputetheprincipalcomponents,

basedonsingularvaluedecomposition• Quickreminder:SVD

– Anyrealnxmmatrix(n>m)canbedecomposedas

– WhereMisan(nxm)columnorthonormalmatrixofleftsingularvectors(columnsofM)

– Π isan(mxm)diagonalmatrixofsingularvalues– NT isan(mxm)roworthonormalmatrixofrightsingularvectors

(columnsofN)

2-Nov-1741



PCAbySVD• TorelatethistoPCA,weconsiderthedatamatrix

• Thesamplemeanis

2-Nov-1742



PCAbySVD• CenterthedatabysubtractingthemeantoeachcolumnofX• Thecentereddatamatrix is

2-Nov-1743



PCAbySVD• Thesamplecovariance matrixis

wherexic istheith columnofXc• Thiscanbewrittenas

2-Nov-1744



PCAbySVD• Thematrix

isreal(nxd).Assumingn>dithasSVDdecomposition

and

2-Nov-1745



PCAbySVD

• NotethatNis(dxd)andorthonormal,andΠ2 isdiagonal.ThisisjusttheeigenvaluedecompositionofΣ

• Itfollowsthat– TheeigenvectorsofΣ arethecolumnsofN– TheeigenvaluesofΣ are

• ThisgivesanalternativealgorithmforPCA

2-Nov-1746



PCAbySVD• Insummary,computationofPCAbySVD• GivenXwithoneexamplepercolumn

– Createthecentereddatamatrix

– ComputeitsSVD

– PrincipalcomponentsarecolumnsofN,eigenvaluesare

2-Nov-1747



RuleofthumbforfindingthenumberofPCAcomponents

• Anaturalmeasureistopicktheeigenvectorsthatexplainp%ofthedatavariability– Canbedonebyplottingtheratiork asafunctionofk

– E.g.weneed3eigenvectorstocover70%ofthevariabilityofthisdataset

2-Nov-1748



Whatwewilllearntoday• Introductiontofacerecognition• PrincipalComponentAnalysis(PCA)• Imagecompression

2-Nov-1749


OriginalImage

• Divide the original 372x492 image into patches:• Each patch is an instance that contains 12x12 pixels on a grid

• View each as a 144-D vector2-Nov-1750


L2 errorandPCAdim

2-Nov-1751


PCAcompression:144D) 60D

2-Nov-1752



2-Nov-1753


16mostimportanteigenvectors

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2-Nov-1754



2-Nov-1755


2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

2 4 6 8 10 12

24681012

6mostimportanteigenvectors

2-Nov-1756



2-Nov-1757


2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

3 most important eigenvectors

2-Nov-1758



2-Nov-1759


Whatwehavelearnedtoday• Introductiontofacerecognition• PrincipalComponentAnalysis(PCA)• Imagecompression

2-Nov-1760

lecture: face recognition and feature reductioncs131.stanford.edu/files/12_svd_pca.pdf · stanford...

Documents