cpsc 340: machine learning and data mining - cs.ubc.cafwood/cs340/lectures/l19.pdf · perceptron...
TRANSCRIPT
![Page 1: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/1.jpg)
CPSC340:MachineLearningandDataMining
LinearClassifiersSpring2019
![Page 2: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/2.jpg)
LastTime:L1-Regularization• WediscussedL1-regularization:
– Alsoknownas“LASSO”and“basispursuitdenoising”.– Regularizes‘w’sowedecreaseourtesterror(likeL2-regularization).– Yieldssparse‘w’ soitselectsfeatures(likeL0-regularization).
• Properties:– It’sconvexandfast tominimize(with“proximal-gradient”methods).– Solutionisnotunique (sometimespeopledoL2- andL1-regularization).– Usuallyincludes“correct”variablesbuttendstoyieldfalsepositives.
![Page 3: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/3.jpg)
L*-Regularization• L0-regularization (AIC,BIC,Mallow’sCp,AdjustedR2,ANOVA):– Addspenaltyonthenumberofnon-zerostoselectfeatures.
• L2-regularization (ridgeregression):– AddingpenaltyontheL2-normof‘w’todecreaseoverfitting:
• L1-regularization (LASSO):– AddingpenaltyontheL1-normdecreasesoverfittingandselectsfeatures:
![Page 4: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/4.jpg)
L0- vs.L1- vs.L2-RegularizationSparse‘w’
(Selects Features)Speed Unique‘w’ Coding Effort Irrelevant
Features
L0-Regularization Yes Slow No Fewlines NotSensitive
L1-Regularization Yes* Fast* No 1 line* NotSensitive
L2-Regularization No Fast Yes 1line Abitsensitive
• L1-Regularizationisn’tassparseasL0-regularization.– L1-regularizationtendstogivemorefalsepositives(selectstoomany).– Andit’sonly“fast”and“1line”withspecializedsolvers.
• CostofL2-regularizedleastsquaresisO(nd2 +d3).– ChangestoO(ndt)for‘t’iterationsofgradientdescent(sameforL1).
• “Elasticnet”(L1- andL2-regularization)issparse,fast,andunique.• UsingL0+L2doesnotgiveauniquesolution.
![Page 5: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/5.jpg)
EnsembleFeatureSelection• Wecanalsouseensemblemethodsforfeatureselection.– Usuallydesignedtoreducefalsepositivesorreduce falsenegatives.
• InthiscaseofL1-regularization,wewanttoreducefalsepositives.– UnlikeL0-regularization,thenon-zerowj arestill“shrunk”.
• “Irrelevant”variablesareincluded,before“relevant”wj reachbestvalue.
• Abootstrap approachtoreducingfalsepositives:– Applythemethodtobootstrapsamplesofthetrainingdata.– Onlytakethefeaturesselectedinallbootstrapsamples.
![Page 6: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/6.jpg)
EnsembleFeatureSelection
• Example:boostrapping plusL1-regularization(“BoLASSO”).– Reducesfalsepositives.– It’spossibletoshowitrecovers“correct”variableswithweakerconditions.
![Page 7: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/7.jpg)
(pause)
![Page 8: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/8.jpg)
Motivation:IdentifyingImportantE-mails• Howcanweautomaticallyidentify‘important’e-mails?
• Abinaryclassification problem(“important”vs.“notimportant”).– Labelsareapproximatedbywhetheryoutookan“action”basedonmail.– High-dimensionalfeatureset(thatwe’lldiscusslater).
• Gmailusesregressionforthisbinaryclassificationproblem.
![Page 9: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/9.jpg)
BinaryClassificationUsingRegression?• Canweapplylinearmodelsforbinaryclassification?– Setyi =+1foroneclass (“important”).– Setyi =-1fortheotherclass(“notimportant”).
• Attrainingtime,fitalinearregressionmodel:
• ThemodelwilltrytomakewTxi =+1for“important”e-mails,andwTxi =-1for“notimportant”e-mails.
![Page 10: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/10.jpg)
BinaryClassificationUsingRegression?• Canweapplylinearmodelsforbinaryclassification?– Setyi =+1foroneclass (“important”).– Setyi =-1fortheotherclass(“notimportant”).
• Linearmodelgivesrealnumberslike0.9,-1.1,andsoon.• Sotopredict,welookatwhetherwTxi iscloserto+1or-1.– IfwTxi =0.9,predict𝑦"i =+1.– IfwTxi =-1.1,predict𝑦"i =-1.– IfwTxi =0.1,predict𝑦"i =+1.– IfwTxi =-100,predict𝑦"i =-1.– Wewritethisoperation(roundingto+1or-1)as𝑦"i =sign(wTxi).
![Page 11: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/11.jpg)
DecisionBoundaryin1D
![Page 12: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/12.jpg)
• Wecaninterpret‘w’asahyperplaneseparatingxintosets:– SetwherewTxi >0andsetwherewTxi <0.
DecisionBoundaryin1D
![Page 13: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/13.jpg)
DecisionBoundaryin2D
decisiontree KNN linearclassifier
• Alinearclassifierwouldbelinearfunction𝑦"i=w0 +w1xi1+w2xi2comingoutofthepage(theboundaryisat𝑦"i=0)
![Page 14: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/14.jpg)
Shouldweuseleastsquaresforclassification?• Considertrainingbyminimizingsquarederrorwithyi thatare+1or-1:
• IfwepredictwTxi =+0.9andyi =+1,errorissmall:(0.9– 1)2 =0.01.• IfwepredictwTxi =-0.8andyi =+1,errorisbigger:(-0.8– 1)2 =3.24.• IfwepredictwTxi =+100andyi =+1,errorishuge:(100– 1)2 =9801.
– Butitshouldn’tbe,thepredictioniscorrect.
• Leastsquarespenalizedforbeing“tooright”.– +100hastherightsign,sotheerrorshouldbezero.
![Page 15: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/15.jpg)
Shouldweuseleastsquaresforclassification?• Leastsquarescanbehaveweirdlywhenappliedtoclassification:
• Why?Squarederrorofgreenlineishuge!– Makesureyouunderstandwhythegreenlineachieves0trainingerror.
![Page 16: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/16.jpg)
“0-1Loss”Function:MinimizingClassificationErrors
• Couldweinsteadminimizenumberofclassificationerrors?– Thisiscalledthe0-1lossfunction:
• Youeithergettheclassificationwrong(1)orright(0).
– WecanwriteusingtheL0-normas||𝑦"– y||0.• Unlikeregression,inclassificationit’sreasonablethat𝑦"𝑖=yi (it’seither+1or-1).
• Importantspecialcase:“linearlyseparable”data.– Classescanbe“separated”byahyper-plane.– Soaperfectlinearclassifierexists.
![Page 17: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/17.jpg)
PerceptronAlgorithmforLinearly-SeparableData• Oneofthefirst“learning”algorithmswasthe“perceptron”(1957).
– Searchesfora‘w’suchthatsign(wTxi)=yi foralli.
• Perceptron algorithm:– Startwithw0 =0.– Gothroughexamplesinanyorderuntilyoumakeamistakepredictingyi.
• Setwt+1 =wt +yixi.– Keepgoingthroughexamplesuntilyoumakenoerrorsontrainingdata.
• Ifaperfectclassifierexists,thisalgorithmfindsoneinfinitenumberofsteps.
• Intuitionforstep:ifyi =+1,“addmoreofxi tow”sothatwTxi islarger.
– Ifyi =-1,youwouldbesubtractingthesquarednorm.
![Page 18: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/18.jpg)
https://en.wikipedia.org/wiki/Perceptron
![Page 19: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/19.jpg)
Geometryofwhywewantthe0-1loss
![Page 20: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/20.jpg)
Thoughtsontheprevious(andnext)slide• Wearenowplottingthelossvs.thepredictedw⊤xi.– “Lossspace”,whichisdifferentthanparameterspaceordataspace.
• We'replottingtheindividuallossforaparticulartrainingexample.– Inthefigurethe labelisyi =−1(solossiscenteredat-1).
• Itwillbecenteredat+1whenyi =+1.
• (Thenextslideisthesameasthepreviousone)
![Page 21: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/21.jpg)
Geometryofwhywewantthe0-1loss
![Page 22: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/22.jpg)
Geometryofwhywewantthe0-1loss
![Page 23: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/23.jpg)
Geometryofwhywewantthe0-1loss
![Page 24: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/24.jpg)
0-1LossFunction• Unfortunatelythe0-1lossisnon-convexin‘w’.– It’seasytominimizeifaperfectclassifierexists(perceptron).– Otherwise,findingthe‘w’minimizing0-1lossisahardproblem.
– Gradientiszeroeverywhere:don’tevenknow“whichwaytogo”.
– NOTthesametypeofproblemwehadwithusingthesquaredloss.• Wecanminimizethesquarederror,butitmightgiveabadmodelforclassification.
• Motivatesconvexapproximationsto0-1loss…
![Page 25: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/25.jpg)
DegenerateConvexApproximationto0-1Loss• Ifyi =+1,wegetthelabelrightifwTxi >0.• Ifyi =-1,wegetthelabelrightifwTxi <0,orequivalently–wTxi >0.• So“classifying‘i’correctly”isequivalenttohavingyiwTxi >0.
• Onepossibleconvexapproximationto0-1loss:– Minimizehowmuchthisconstraintisviolated.
![Page 26: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/26.jpg)
DegenerateConvexApproximationto0-1Loss• Ourconvexapproximationoftheerrorforoneexampleis:
• Wecouldtrainbyminimizingsumoverallexamples:
• Butthishasadegeneratesolution:– Wehavef(0)=0,andthisisthelowestpossiblevalueof‘f’.
• Therearetwostandardfixes:hingelossandlogisticloss.
![Page 27: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/27.jpg)
Summary• Ensemblefeatureselectionreducesfalsepositivesornegatives.• Binaryclassificationusingregression:– Encodeusingyi in{-1,1}.– Use sign(wTxi)asprediction.– “Linearclassifier”(ahyperplanesplittingthespaceinhalf).
• Leastsquaresisaweirderrorforclassification.• Perceptronalgorithm:findsaperfectclassifier(ifoneexists).• 0-1lossistheidealloss,butisnon-smoothandnon-convex.
• Nexttime:oneofthebest“outofthebox”classifiers.27
![Page 28: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/28.jpg)
L1-RegularizationasaFeatureSelectionMethod• Advantages:
– Dealswithconditionalindependence(iflinear).– Sortofdealswithcollinearity:
• Picksatleastoneof“mom”and“mom2”.– Veryfastwithspecializedalgorithms.
• Disadvantages:– Tendstogivefalsepositives(selectstoomanyvariables).
• Neithergoodnorbad:– Doesnottakesmalleffects.– Says“gender”isrelevantifweknow“baby”.– Goodforpredictionifwewantfasttraininganddon’tcareabouthavingsomeirrelevantvariablesincluded.
![Page 29: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/29.jpg)
“ElasticNet”:L2- andL1-Regularization• Toaddressnon-uniqueness,someauthorsuseL2- andL1-:
• Called“elasticnet”regularization.– Solutionissparseandunique.– Slightlybetterwithfeaturedependence:
• Selectsboth“mom”and“mom2”.
• Optimizationiseasierthoughstillnon-differentiable.
![Page 30: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/30.jpg)
L1-RegularizationDebiasing andFiltering• Toremovefalsepositives,someauthorsaddadebiasing step:– Fit‘w’usingL1-regularization.– Grabthenon-zerovaluesof‘w’asthe“relevant”variables.– Re-fitrelevant‘w’usingleastsquaresorL2-regularizedleastsquares.
• ArelateduseofL1-regularizationisasafilteringmethod:– Fit‘w’usingL1-regularization.– Grabthenon-zerovaluesof‘w’asthe“relevant”variables.– Runstandard(slow)variableselectionrestrictedtorelevantvariables.
• Forwardselection,exhaustivesearch,stochasticlocalsearch,etc.
![Page 31: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/31.jpg)
Non-ConvexRegularizers• Regularizing|wj|2 selectsallfeatures.• Regularizing|wj|selectsfewer,butstillhasmanyfalsepositives.• Whatifweregularize|wj|1/2 instead?
• Minimizingthisobjectivewouldleadtofewerfalsepositives.– Lessneedfordebiasing,butit’snotconvexandhardtominimize.
• Therearemanynon-convexregularizers withsimilarproperties.– L1-regularizationis(basically)the“mostsparse”convexregularizer.
![Page 32: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/32.jpg)
Canwejustuseleastsquares??• Whatwentwrong?– “Good”errorsvs.“bad”errors.
![Page 33: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/33.jpg)
Canwejustuseleastsquares??• Whatwentwrong?– “Good”errorsvs.“bad”errors.
![Page 34: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/34.jpg)
OnlineClassificationwithPerceptron• Perceptron foronlinelinearbinaryclassification[Rosenblatt,1957]
– Startwithw0 =0.– Attime‘t’wereceivefeaturesxt.– Wepredict𝑦"t =sign(wt
Txt).– If𝑦"t ≠yt,thensetwt+1 =wt +ytxt.
• Otherwise,setwt+1 =wt.
(SlidesareoldsoaboveI’musingsubscriptsof‘t’insteadofsuperscripts.)
• Perceptronmistakebound[Novikoff,1962]:– Assumedataislinearly-separable witha“margin”:
• Thereexistsw*with||w*||=1suchthatsign(xtTw*)=sign(yt)forall‘t’and|xTw*|≥γ.– Thenthenumberoftotalmistakesisbounded.
• NorequirementthatdataisIID.
![Page 35: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/35.jpg)
PerceptronMistakeBound• Let’snormalizeeachxt sothat||xt||=1.– Lengthdoesn’tchangelabel.
• Wheneverwemakeamistake,wehavesign(yt)≠sign(wtTxt)and
• Soafter‘k’errorswehave||wt||2 ≤k.
![Page 36: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/36.jpg)
PerceptronMistakeBound• Let’sconsiderasolutionw*,sosign(yt)=sign(xtTw*).
– Andlet’schooseaw*with||w*||=1,• Wheneverwemakeamistake,wehave:
– Note:wtTw* ≥0byinduction(startsat0,thenatleastasbigasoldvalueplusγ).
• Soafter‘k’mistakeswehave||wt||≥γk.
![Page 37: CPSC 340: Machine Learning and Data Mining - cs.ubc.cafwood/CS340/lectures/L19.pdf · Perceptron Algorithm for Linearly-Separable Data • One of the first “learning” algorithms](https://reader030.vdocument.in/reader030/viewer/2022020305/5ce1019088c99388178c990f/html5/thumbnails/37.jpg)
PerceptronMistakeBound• Soourtwoboundsare||wt||≤sqrt(k)and ||wt||≥γk.
• Thisgivesγk≤sqrt(k),oramaximumof1/γ2 mistakes.– Notethatγ >0byassumptionand isupper-boundedbyoneby||x||≤1.– Afterthis‘k’,underourassumptionswe’reguaranteedtohaveaperfectclassifier.