probability theory for machine learning · •probability theory provides a consistent framework...
TRANSCRIPT
![Page 1: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/1.jpg)
ProbabilityTheoryforMachineLearning
ChrisCremerSeptember2015
![Page 2: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/2.jpg)
Outline
• Motivation• ProbabilityDefinitionsandRules• ProbabilityDistributions• MLEforGaussianParameterEstimation• MLEandLeastSquares• LeastSquaresDemo
![Page 3: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/3.jpg)
Material
• PatternRecognitionandMachineLearning- ChristopherM.Bishop• AllofStatistics– LarryWasserman• WolframMathWorld• Wikipedia
![Page 4: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/4.jpg)
Motivation
• Uncertaintyarisesthrough:• Noisymeasurements• Finitesizeofdatasets• Ambiguity:Thewordbankcanmean(1)afinancialinstitution,(2)thesideofariver,or(3)tiltinganairplane.Whichmeaningwasintended,basedonthewordsthatappearnearby?
• LimitedModelComplexity
• Probabilitytheoryprovidesaconsistentframeworkforthequantificationandmanipulationofuncertainty• Allowsustomakeoptimalpredictionsgivenalltheinformationavailabletous,eventhoughthatinformationmaybeincompleteorambiguous
![Page 5: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/5.jpg)
SampleSpace
• ThesamplespaceΩisthesetofpossibleoutcomesofanexperiment.PointsωinΩarecalledsampleoutcomes,realizations,orelements.SubsetsofΩarecalledEvents.
• Example.IfwetossacointwicethenΩ={HH,HT,TH,TT}.TheeventthatthefirsttossisheadsisA={HH,HT}
• WesaythateventsA1andA2aredisjoint(mutuallyexclusive)ifAi∩Aj ={}• Example:firstflipbeingheadsandfirstflipbeingtails
![Page 6: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/6.jpg)
Probability
• WewillassignarealnumberP(A)toeveryeventA,calledtheprobabilityofA.• Toqualifyasaprobability,Pmustsatisfythreeaxioms:• Axiom1:P(A)≥0foreveryA• Axiom2:P(Ω)=1• Axiom3:IfA1,A2,...aredisjointthen
![Page 7: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/7.jpg)
JointandConditionalProbabilities
• JointProbability• P(X,Y)• ProbabilityofXandY
• ConditionalProbability• P(X|Y)• ProbabilityofXgivenY
![Page 8: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/8.jpg)
IndependentandConditionalProbabilities
• AssumingthatP(B)>0,theconditional probabilityofAgivenB:• P(A|B)=P(AB)/P(B)• P(AB)=P(A|B)P(B)=P(B|A)P(A)
• ProductRule
• TwoeventsAandBareindependent if• P(AB)=P(A)P(B)
• Joint=ProductofMarginals
• TwoeventsAandBareconditionally independent givenCiftheyareindependentafterconditioningonC• P(AB|C)=P(B|AC)P(A|C)=P(B|C)P(A|C)
Ifdisjoint,areeventsAandBalsoindependent?
![Page 9: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/9.jpg)
Example
• 60%ofMLstudentspassthefinaland45%ofMLstudentspassboththefinalandthemidterm*• Whatpercentofstudentswhopassedthefinalalsopassedthemidterm?
*Thesearemadeupvalues.
![Page 10: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/10.jpg)
Example
• 60%ofMLstudentspassthefinaland45%ofMLstudentspassboththefinalandthemidterm*• Whatpercentofstudentswhopassedthefinalalsopassedthemidterm?
• Reworded:Whatpercentofstudentspassedthemidtermgiventheypassedthefinal?• P(M|F)=P(M,F)/P(F)• =.45/.60• =.75
*Thesearemadeupvalues.
![Page 11: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/11.jpg)
MarginalizationandLawofTotalProbability
• Marginalization(SumRule)
• LawofTotalProbability
Ishouldmakeexampleofboth!!!!!!!Maybeevenvisualizationofsumrule,someovermatrixofprobs
![Page 12: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/12.jpg)
Bayes’Rule
P(A|B)=P(AB)/P(B) (ConditionalProbability)P(A|B)=P(B|A)P(A)/P(B)(ProductRule)P(A|B)=P(B|A)P(A)/Σ P(B|A)P(A)(LawofTotalProbability)
![Page 13: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/13.jpg)
Bayes’Rule
![Page 14: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/14.jpg)
Example
• Supposeyouhavetestedpositiveforadisease;whatistheprobabilitythatyouactuallyhavethedisease?• Itdependsontheaccuracyandsensitivityofthetest,andonthebackground(prior)probabilityofthedisease.• P(T=1|D=1)=.95(truepositive)• P(T=1|D=0)=.10(falsepositive)• P(D=1)=.01(prior)
• P(D=1|T=1)=?
![Page 15: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/15.jpg)
Example
• P(T=1|D=1)=.95(truepositive)• P(T=1|D=0)=.10(falsepositive)• P(D=1)=.01(prior)
Bayes’Rule• P(D|T)=P(T|D)P(D)/P(T)=.95*.01/.1085=.087
LawofTotalProbability• P(T)= ΣP(T|D)P(D)=P(T|D=1)P(D=1)+P(T|D=0)P(D=0)=.95*.01+.1*.99=.1085
Theprobabilitythatyouhavethediseasegivenyoutestedpositiveis8.7%
![Page 16: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/16.jpg)
RandomVariable
• Howdowelinksamplespacesandeventstodata?• ArandomvariableisamappingthatassignsarealnumberX(ω)toeachoutcomeω
• Example:Flipacointentimes.LetX(ω)bethenumberofheadsinthesequenceω.Ifω=HHTHHTHHTT,thenX(ω)=6.
![Page 17: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/17.jpg)
DiscretevsContinuousRandomVariables
• Discrete:canonlytakeacountablenumberofvalues• Example:numberofheads• Distributiondefinedbyprobabilitymassfunction(pmf)• Marginalization:
• Continuous: cantakeinfinitelymanyvalues(realnumbers)• Example:timetakentoaccomplishtask• Distributiondefinedbyprobabilitydensityfunction(pdf)• Marginalization:
![Page 18: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/18.jpg)
ProbabilityDistributionStatistics
• Mean:E[x]=μ =firstmoment=
• Variance:Var(X)=
• Nthmoment=
Univariatecontinuousrandomvariable
Univariatediscreterandomvariable=
=
![Page 19: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/19.jpg)
BernoulliDistribution
• RV:x∈ {0,1}• Parameter:μ
• Mean=E[x]=μ• Variance=μ(1−μ)
DiscreteDistribution
= .6$ (1 − .6)$)$= .6
Example:Probabilityofflippingheads(x=1)withaunfaircoin
![Page 20: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/20.jpg)
BinomialDistribution
• RV:m=numberofsuccesses• Parameters:N=numberoftrials
μ =probabilityofsuccess
• Mean=E[x]=Nμ• Variance=Nμ(1−μ)
DiscreteDistribution
Example:Probabilityofflippingheadsmtimesoutof15independentflipswithsuccessprobability0.2
![Page 21: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/21.jpg)
MultinomialDistribution
• Themultinomialdistributionisageneralizationofthebinomialdistributiontokcategoriesinsteadofjustbinary(success/fail)• Fornindependenttrialseachofwhichleadstoasuccessforexactlyoneofkcategories,themultinomialdistributiongivestheprobabilityofanyparticularcombinationofnumbersofsuccessesforthevariouscategories• Example:RollingadieNtimes
DiscreteDistribution
![Page 22: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/22.jpg)
MultinomialDistribution
• RVs:m1 …mK (counts)• Parameters:N=numberoftrials
μ =μ1 …μK probabilityofsuccessforeachcategory,Σμ=1
• Meanofmk:Nµk
• Varianceofmk:Nµk(1-µk)
DiscreteDistribution
![Page 23: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/23.jpg)
MultinomialDistribution
• RVs:m1 …mK (counts)• Parameters:N=numberoftrials
μ =μ1 …μK probabilityofsuccessforeachcategory,Σμ=1
• Meanofmk:Nµk
• Varianceofmk:Nµk(1-µk)
DiscreteDistribution
Ex:Rolling2onafairdie5timesoutof10rolls.
[0,5,0,0,0,0]
10
[1/6,1/6,1/6,1/6,1/6,1/6]
105
$-
.= /0$-
![Page 24: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/24.jpg)
GaussianDistribution
• Akathenormaldistribution• Widelyusedmodelforthedistributionofcontinuousvariables• Inthecaseofasinglevariablex,theGaussiandistributioncanbewrittenintheform
• whereμisthemeanandσ2 isthevariance
ContinuousDistribution
![Page 25: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/25.jpg)
GaussianDistribution
• Akathenormaldistribution• Widelyusedmodelforthedistributionofcontinuousvariables• Inthecaseofasinglevariablex,theGaussiandistributioncanbewrittenintheform
• whereμisthemeanandσ2 isthevariance
ContinuousDistribution
normalizationconstant 𝑒()2345678892:5;<7=6>??75;)
![Page 26: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/26.jpg)
GaussianDistribution
• Gaussianswithdifferentmeansandvariances
![Page 27: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/27.jpg)
MultivariateGaussianDistribution
• ForaD-dimensionalvectorx,themultivariateGaussiandistributiontakestheform
• whereμisaD-dimensionalmeanvector• ΣisaD× Dcovariancematrix• |Σ|denotesthedeterminantofΣ
![Page 28: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/28.jpg)
InferringParameters
• WehavedataXandweassumeitcomesfromsomedistribution• Howdowefigureouttheparametersthat‘best’fitthatdistribution?• MaximumLikelihoodEstimation(MLE)
• MaximumaPosteriori(MAP)
See‘GibbsSamplingfortheUninitiated’forastraightforwardintroductiontoparameterestimation:http://www.umiacs.umd.edu/~resnik/pubs/LAMP-TR-153.pdf
![Page 29: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/29.jpg)
I.I.D.
• Randomvariablesareindependentandidenticallydistributed(i.i.d.)iftheyhavethesameprobabilitydistributionastheothersandareallmutuallyindependent.
• Example:CoinflipsareassumedtobeIID
![Page 30: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/30.jpg)
MLEforparameterestimation
• TheparametersofaGaussiandistributionarethemean(µ)andvariance(σ2)
• We’llestimatetheparametersusingMLE• Givenobservationsx1,...,xN ,thelikelihoodofthoseobservationsforacertainµandσ2(assumingIID)is
Likelihood=
Recall:IfIID,P(ABC)=P(A)P(B)P(A)
![Page 31: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/31.jpg)
MLEforparameterestimation
What’sthedistribution’smeanandvariance?
Likelihood=
![Page 32: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/32.jpg)
MLEforGaussianParameters
• Nowwewanttomaximizethisfunctionwrt µ• Insteadofmaximizingtheproduct,wetakethelogofthelikelihoodsotheproductbecomesasum
• Wecandothisbecauselogismonotonicallyincreasing• Meaning
Likelihood=
LogLikelihood=log Log
![Page 33: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/33.jpg)
MLEforGaussianParameters
• LogLikelihoodsimplifiesto:
• Nowwewanttomaximizethisfunctionwrt μ• How?
Toseeproofsforthesederivations:http://www.statlect.com/normal_distribution_maximum_likelihood.htm
![Page 34: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/34.jpg)
MLEforGaussianParameters
• LogLikelihoodsimplifiesto:
• Nowwewanttomaximizethisfunctionwrt μ• Takethederivative,setto0,solveforμ
Toseeproofsforthesederivations:http://www.statlect.com/normal_distribution_maximum_likelihood.htm
![Page 35: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/35.jpg)
MaximumLikelihoodandLeastSquares
• Supposethatyouarepresentedwithasequenceofdatapoints(X1,T1),...,(Xn,Tn),andyouareaskedtofindthe“bestfit”linepassingthroughthosepoints.• Inordertoanswerthisyouneedtoknowpreciselyhowtotellwhetheronelineis“fitter”thananother• Acommonmeasureoffitnessisthesquared-error
ForagooddiscussionofMaximumlikelihoodestimatorsandleastsquaresseehttp://people.math.gatech.edu/~ecroot/3225/maximum_likelihood.pdf
![Page 36: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/36.jpg)
MaximumLikelihoodandLeastSquares
y(x,w)isestimatingthetargett
• Error/Loss/Cost/Objectivefunctionmeasuresthesquarederror
• LeastSquareRegression• MinimizeL(w)wrt w
Greenlines
Redline
![Page 37: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/37.jpg)
MaximumLikelihoodandLeastSquares
• Nowweapproachcurvefittingfromaprobabilisticperspective• Wecanexpressouruncertaintyoverthevalueofthetargetvariableusingaprobabilitydistribution• Weassume,giventhevalueofx,thecorrespondingvalueofthasaGaussiandistributionwithameanequaltothevaluey(x,w)
βistheprecisionparameter(inversevariance)
![Page 38: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/38.jpg)
MaximumLikelihoodandLeastSquares
![Page 39: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/39.jpg)
MaximumLikelihoodandLeastSquares
• Wenowusethetrainingdata{x,t}todeterminethevaluesoftheunknownparameterswandβbymaximumlikelihood
• LogLikelihood
![Page 40: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/40.jpg)
MaximumLikelihoodandLeastSquares• LogLikelihood
• MaximizeLogLikelihoodwrt tow• Sincelasttwoterms,don’tdependonw,theycanbeomitted.
• Also,scalingtheloglikelihoodbyapositiveconstantβ/2doesnotalterthelocationofthemaximumwithrespecttow,soitcanbeignored
• Result:Maximize
![Page 41: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/41.jpg)
MaximumLikelihoodandLeastSquares
• MLE• Maximize
• LeastSquares• Minimize
• Therefore,maximizinglikelihoodisequivalent,sofarasdeterminingwisconcerned,tominimizingthesum-of-squareserrorfunction• Significance:sum-of-squareserrorfunctionarisesasaconsequenceofmaximizinglikelihoodundertheassumptionofaGaussiannoisedistribution
![Page 42: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/42.jpg)
Matlab LinearRegressionDemo
![Page 43: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/43.jpg)
TrainingSet
![Page 44: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/44.jpg)
TrainingSet
ValidationSetHeldOutData
![Page 45: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/45.jpg)
TrainingSet
ValidationSetHeldOutData
TrainingSet Error ValidationSetError
Linear ++++ +++++
Quadratic +++ ++++++
Cubic ++ +++++++
4th degreepolynomial + ++++++++
![Page 46: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/46.jpg)
TrainingSet Error ValidationSetError
Linear ++++ +++++
Quadratic +++ ++++++
Cubic ++ +++++++
4th degreepolynomial + ++++++++
Howwellyourmodelgeneralizestonewdataiswhatmatters!
![Page 47: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/47.jpg)
MultivariateGaussianDistribution
• ForaD-dimensionalvectorx,themultivariateGaussiandistributiontakestheform
• whereμisaD-dimensionalmeanvector• ΣisaD× Dcovariancematrix• |Σ|denotesthedeterminantofΣ
![Page 48: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/48.jpg)
CovarianceMatrix
![Page 49: Probability Theory for Machine Learning · •Probability theory provides a consistent framework for the quantification and manipulation of uncertainty •Allows us to make optimal](https://reader036.vdocument.in/reader036/viewer/2022070810/5f096a967e708231d426bc10/html5/thumbnails/49.jpg)
Questions?