basic econometrics notes
TRANSCRIPT
BasicEconometrics
Notes
Econometricsisabranchofmathematicalstatisticsthatisconcernedwithapplyingstatisticstonon-experimentaldata.Usedfor;-Estimatingeconomicrelationships-Testingeconomictheories-Evaluatingpolicy-ForecastingmacrovariablesNon-experimentaldatareferstodatathatisnotaccumulatedthroughtheuseofcontrolledexperimentsonindividuals,firmsorsegmentsoftheeconomy.Non-experimentaldataisalsoreferredtoasobservationaldataorretrospectivedata.Experimentaldata,ontheotherhand,referstodatacollectedinlaboratoryenvironmentsinthenaturalsciences.Itishardtodeviseexperimentsineconomicsbecauseof;-Theyareoftenimpossibletoconduct-Theyareprohibitivelyexpensive-TheycanbemorallyrepugnanttoconductthekindsofexperimentsneededtoaddresseconomicissuesAnempiricalanalysisemploysdatatotestatheoryorestimatearelationship.4Stepsinconductinganempiricalanalysis.(S.E.E.F.)1.Specification(F.F.D.A.)Thisinvolvesspecifyingthemodel,whichisformulatingthequestionthatdescribesvariousrelationships.Aneconomicmodelconsistsofmathematicalequationsthatdescribevariousrelationships.Thisentailslookingat:-First-Thenatureofstochasticfunctions-Thereisanunobservable‘populationregressionfunction’(PRF),whichwetrytoapproximatewithour‘sampleregressionfunction’(SRF)-Thisgivesustheconceptofthe‘conditionalmean’,E(y|x)(E=expected)-Second-Thetypeofmathematicalfunctions-Themostcommonlyusedfunctionalformsarethefollowing;-Linearfunctions:yt=β0+β1xt-Quadraticfunctions:yt=β0+β1x2t-Powerfunctions:yt=β0(xt)β1-Exponentialfunctions:yt=β0eβ1t-Third–Thechoiceofvariables-Choosingthedataavailabilityanddataqualityused.-Fourth–Theassumptionsmadeabouttheerrorterms-Theerrorterm,ut,representsthosefactorsthatimpactonytwhicharenotincludedinthemodel-Wecanneverknowexactlywhattheimpactonytisofalltheothervariables-Becauseofthis,weneedtomakecertainassumptionsaboutthenatureofut-Thereare5keyassumptionsgenerallymadeabouttheerrorterms:1.Theerrortermsarerandomvariableswithanormaldistribution2.Theyhaveazeromean,thatis,E(ut)=0forallt=1,2,...,n3.Theyhavethesamevariance,thatis,Var(ut)=σ2forallt=1,2,...,n 4.Theyareindependentrandomvariables,thatis,Cov(ut,us)=0forallt,s=1,2,...,n
5.Theerrortermsandtheindependentvariablesthemselvesareindependentofeachother,thatis,Cov(ut,Xt)=02.Estimation-Themodelisestimatedhere,toestimateweuseordinaryleastsquaresmethod(OLS).Thisisbasedonminimisingthesumofthesquaredresidualsinthis
equation .OLSminimisesthefunctionwithrespecttobeta0andbeta1.(Anythingwithahatisestimatedorexpectedvalues)3.Evaluation-Duringthisstagewealwaysexaminetheestimatedresiduals.Ifourmodeliscorrectlyspecifiedthentheestimatedresiduals(e)arethesampleestimatesofthepopulationerrorterms(u).Wealsolookattheparameterssizeandsign(beta0and1)toseeiftheymakesense.-Oncewearesatisfiedthatitmakessenseweconductteststoseeifthereisasignificantrelationship.Individualtestsincludet-testsandcorrelationanalysisandtestthesignificanceofindividualvariables.JointteststestforoverallsignificanceandconsistofRsquared,RhatsquaredandANOVA.4.ForecastingWhenwetalkaboutforecasts,werefertoforecastsfromtwodifferentperiods:-Withinsampleforecasts,whichareforecastsgeneratedwithinthesampleperiodusedtoestimatethemodel-Outofsampleforecasts,whichareforecastsgeneratedforanyperiodthatwasnotusedinestimatingthemodelTherearealsotwotypesofforecasts,staticanddynamic:-Staticforecastsareobtainedbysubstitutingknownvaluesoftheindependentvariableintotheestimatedequation-Dynamicforecastsareobtainedbysubstitutingvaluesoftheindependentvariables,whicharenotactuallyknownbutaregeneratedbythemodelMostapplicationsareconcernedwithoutofsamplestaticforecastsToestimateamodelweneedtoconverttheeconomicmodelintoaestimableeconometricmodel:
Fromthis: Tothis:
Econometricmodelsmaybeusedforhypothesistesting.Forexample,theparameterB3representstheeffectoftrainingonthewage.Howlargeistheeffectandisitstatisticallydifferentfromzero?
2t1
n
1t0t
n
1t
2t ]xˆˆy[e β−β−=∑∑
==
StructureofEconomicDataToundertakeaneconometricanalysis,dataisrequiredonthevariablesofinterest.-Thetypeofeconometricmethodsemployedwilldependonthenatureofthedataused.-Theuseofaninappropriateestimationmethodmayleadtomisleadingresults
• Therearefourdifferentkindsofeconomicdatasets• Cross-sectionaldata• Timeseriesdata• Pooledcrosssections• PanelorLongitudinaldata
Cross-sectionaldatasets:Thesedatasetsconsistofasampleofobservationsonindividuals,households,firms,cities,states,countries,orotherentitiesofinterest,measuredataparticularpointintime.(theaverageageexample).Cross-sectionalobservationsareassumedtohavebeenobtainedthroughpurerandomsamplingfromapopulation.Theyareassumedtobeindependent.Purerandomsamplingiswhereeachobservationisdrawnatrandomfromapopulation,sothateachindividualobservationisaslikelyasthenexttobedrawn,andthateachdrawisindependentofallotherdraws.Thisdataisusuallyemployedinfieldsofappliedmicroeconomics.Problemswiththisdataandpurerandomsampling:-Refusaltoparticipate(leadstosampleselectionbias)-Samplingcanleadtoclustering(wheresampledentitiesarelargelyrelativetothepopulation.-Differentvariablescancorrespondtodifferenttimeperiodsincross-sectionaldata.Example
Timeseriesdata:Thistypeofdatasetconsistsofobservationsonavariableorseveralvariables,measuredovertime.Timeseriesobservationsarenotgenerallyindependent.Theytendtobestronglyrelatedtotheirrecenthistories.Theyare‘seriallycorrelated’.Becauseofthisdependency,modificationstostandardestimation
techniqueshavebeendevelopedtotakeintoaccountthisdependency.Timeseriesdatatypicallydisplaytrendsandseasonalpatterns.Someofthemorecommonfrequenciesatwhichdataarecollectedaredaily,weekly,monthly,quarterly,andannualintervals.TimeseriesdataistypicallyemployedinfieldsofAppliedMacroeconomicsandFinance.Example:
Pooledcrosssections:(combinationofcross-sectionalandtimeseries)Thesedatasetshavebothacross-sectionalandtimeseriesdimension.Forinstance,supposethattwohouseholdsurveysaretakeninAustralia,onein2010andtheotherin2014.Combiningthetwoyearstoincreasethesamplesizecanformapooledcrosssection.Crosssectionsaredrawnindependentlyofeachother.Forinstance,anewrandomsampleofhouseholdswouldbetakenin2014toanswerthesamesurveyquestions.Pooledcrosssectionscanoftenbeusedtoevaluatepolicychanges.Forinstance,supposewewantedtoevaluatetheeffectofareductioninpropertytaxesthatoccurredin1994.Wewouldcollectarandomsampleofhousepricesfortheyear,1993.Anewrandomsampleofhousepriceswouldbecollectedfortheyear,1995.Wewouldthencomparetheeffectthatreductionsinpropertytaxeshavehadonhouseprices.Comparebeforeandafterthereformwasimplemented,where1993wasbeforethereformand1995isafterthereform.
Checkoutthesplitinyear’sform1993to1995.
PanelorLongitudinalDataThistypeofdatasetconsistsofatimeseriesforeachcrosssectionalentity.Withpaneldata,thesamecross-sectionalunitsarefollowedovertime.Thisfeaturedistinguishesapaneldatasetfromthatofapooledcrosssection.Assuch,paneldatahaveacross-sectionalandatimeseriesdimension.Observingthesameentitiesovertimeleadstoseveraladvantagesovercross-sectionalorpooledcrosssectionaldata.Paneldataallowsustocontrolfortime-invariantunobservedcharacteristics.Paneldataallowsustomodellaggedresponses.Considerthefollowingexampleoncrimestatisticsfor150cities.Eachcityisobservedintwoyears,1986and1990.Anumberoftime-invariantunobservedcitycharacteristicsmaybemodelled.Theeffectofpoliceoncrimeratesmayexhibitatimelag,whichcanbemodelled.Example:2yeardataoncitycrimefor150cities.
CausalityandCeterisParibus:Ourgoalistoinferthatonevariablehasacausaleffectonanothervariable.Causalityiswhereachangeinonevariableleadstoachangeinanothervariable.Findinganassociationbetweentwoormorevariablesmaybesuggestivebutnotconclusiveunlesscausalityisestablished(achangemaybecoincidenceorafactorhasnotbeenaccountedforinthemodel).Weneedtoholdallotherfactorsconstantinordertoanalysetheeffectofoneparticularvariableonanother.Ifallotherfactorsarenotheldconstant,thenitisimpossibletoknowthecausaleffectofonevariableonanother.Someexampleswhereinferringcausalityarisesconsistof:-Measuringthereturnofeducation-Analysingtheeffectoflawenforcementoncrimelevels-AnalysingtheeffectoftheminimumwageonunemploymentThisisduetothefactthatitisdifficulttoisolatevariablesinarelationshipthatwillbeobviouslyaffectedbyothervariablesinasimplelinearregression.
CHAPTER2:TheSimpleRegressionModelDefinitions:Weareinterestedinexplainingyintermsofxorhowyvarieswithchangesinx.Inwritingamodelthatexplainsthistherearethreeissues:-First,sincethereisneveranexactrelationship,howdoweallowforotherfactorstoaffecty?-Second,whatisthefunctionalformbetweenyandx?-Third,howcanwebesurethatwearecapturingaceterisparibusrelationshipbetweenyandx?Someofthesecanbesolvedbywritingtheequationyt=β0+β1xt+uwhichisknownasthesimplelinearregression(SLR)orthetwovariableregressionmodelorabivariateregressionmodel.-Thevariablesyandxhaveseveraldifferentnamesthatareusedinterchangeably:-Theyvariableiscalledthedependentvariable,theexplainedvariable,theresponsevariable,thepredictedvariable,andtheregressand-Thexvariableiscalledtheindependentvariable,theexplanatoryvariable,thecontrolvariable,thepredictorvariable,andtheregressor-Thevariable,u,representsallfactorsotherthanxthataffecty.Itisknownastheerrorterm,thedisturbanceterm,thestochasticterm,andtherandomtermTherefore,ifthechangeinu=0thenthechangeinywill=thechangeinx.BorepresentstheinterceptandB1representstheslopeparameterinaSLR.UsingtheassumptionthatE(u)=0,andthatE(β0)=β0andE(β1)=β1,wecanobtainwhatisknownasthe‘populationregressionfunction’(PRF):E(y|x)=E(β0+β1x+u)E(y|x)=E(β0+β1x)+E(u)E(y|x)=E(β0)+E(β1x)+E(u)Therefore:E(y|x)=β0+β1x.ThismeansthelinearrelationshipofthePRFgivesusaoneunitincreasesinxchangestheexpectedvalueofybytheslopeamount.DerivingOLSestimatesAlinebestfitdrawniscalledthesampleregressionfunctionthatis
Foranyxivalue,thedifferencebetweentheactualvalueofyiandthevaluegivenbythesampleregressionfunctioniscalledtheresidual,ui,where:
Foracorrectlyspecifiedmodeltheresidualisthesampleestimateoftheerrorterm.Liketheerrorterm(u),itrepresentsthatpartofthevalueofthevariableythattheestimatedlinearmodelisunabletoexplain.TheOLSmethodstatesthatweshouldchoosetheSMFlinethathasthesmallestresiduals.Thelinethatminimisestheamountofvariationinythatcannotbe
i10i xˆˆy β+β=
)xˆˆ(yyyu i10iiii β+β−=−=
explainedbythemodel.Tomeasurethevariationwesumtheresiduals.Howevertheywouldequal0aretheyarenegativeandpositive.Toavoidthiswe
sumtheirsquaredvalues: TheOLSprocedureminimisesthefunctionwithrespecttotwounknownsBoandB1.Usingcalculusthenecessaryconditionsforlocalextremumare
Thisisnowexpandedtogive‘normalequations’
Andsolved:
Usinganexamplewherei=7wehave
Sothat:THESEFORMULASAREVERYIMPORTANT!
Andwegetaregressionfunctionof:
Weinterpretthisas:
§ Whenthetemperatureisequalto0oC,thelevelofmonthlysalesforFisher-Hausenis-$1,728.96
§ Anincreaseintemperatureby1oCleadstoanincreaseinmonthlysalesforFisher-Hausenby$466.33
§ Supposethatthetemperatureispredictedtobe20oCinAugust
WhatwouldbethelevelofsalesinAugustearnedbyFisher-Hausen?Ŝi=-1728.956+466.330(20)=$7597.64
∑∑==
−=n
1i
2ii
n
1i
2i ]yy[u 2
i1
n
1i0i ]xˆˆy[ β−β−=∑
=
0x)xˆˆy(2ˆ
u
0)xˆˆy(2ˆ
u
ii10i1
n
1i
2i
i10i0
n
1i
2i
=β−β−−=β∂
∂
=β−β−−=β∂
∂
∑∑
∑∑
=
=
0xˆxˆyx
0xˆnˆy
2i1i0ii
i10i
=β−β−
=β−β−
∑∑∑
∑∑
( )( )[ ]
( )∑
∑
∑∑
∑∑∑
=
=
==
===
−
−−=
⎟⎠
⎞⎜⎝
⎛−
⎟⎠
⎞⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛−
=β n
1i
2i
n
1iii
2n
1it
n
1i
2i
n
1ii
n
1ii
n
1iii
1
xx
yyxx
xxn
yxyxnˆ
xˆyˆ10 β−=β
∑=
=n
1ii 76500y ∑
=
=n
1ii 190x ∑
=
=n
1iii 2116000yx 143.27x =
570.10928y = ∑=
=n
1i
2t 5242x
330.466)190()5242(7
)76500)(190()2116000(7ˆ21 =
−
−=β
956.1728)143.27(330.466570.10928ˆ0 −=−=β
ii x330.466956.1728y +−=
OLSSampledataproperties:
1.OLSresidualsiszerobecause 2.CovariancebetweenregressorsandOLSresidualsiszerobecause
3.ThepointxandyhatisalwaysontheestimatedOLSregressionline.Inotherwords,ifwetakethesampleregressionfunction,andpluginxhatforx,thenthepredictedvalueisyhat.Whenevaluatingthequalityofourestimatedmodelweusethevarianceoftheerrorterms.toobtainanunbiasedestimateofthevariancefortheerrortermweusethefollowingformula:
Wherekisthenumberofparametersestimated,excludingtheconstant.Anduhat=0
Standarderroris: Thedifferencebetweenyobservationandyestimateistotalvariance.
( ) 0xˆˆyn
1ii10i =−−∑
=
ββ
( ) 0uxyn
1ii
n
1ii10i ==−− ∑∑
==
ˆˆˆ ββ
( )
)1kn(
uuˆ
n
1i
2i
2
−−
−=σ∑=
σ 2 =ut2∑
n− k −1
)1kn(
uˆ
n
1i
2t
−−=σ
∑=
6,000
7,000
8,000
9,000
10,000
11,000
12,000
13,000
14,000
15,000
16,000
20 22 24 26 28 30 32 34
Y
X
ExplainedvariationisbetweenthehorizontallineandtheendpointofOLSline.Theunexplainedvariationisthebracketedspacethatisuhat.Thereforetotalvariationisyobservation–yestimate=yobservation.Thisistruewhenwehavenocorrelationbetweenvariables.IfwecallthedistancebetweenY1andtheTotalVariation,thenthiscanbedividedintotwoparts:Thefirstpartisthedifferencebetweenthevaluepredictedbythemodelandthemeanvalue.-ThisiscalledtheExplainedVariationbecauseitisthecomponentofthevariationinYifrom,whichourregressionmodelcanexplain
Forobservation1,wehave:ExplainedVariation(ESS)= -ThesecondpartisthedifferencebetweentheactualvalueofY,andthevaluethatthemodelpredicts.ThisiscalledtheResidualorChanceVariationbecauseitisthecomponentofvariationthatthemodelisunabletoexplain:ResidualVariation(RSS)=
TotalvariationTSS=ESS+RSS
Oncethemodelhasbeenestimateditmustbefittedtodeterminethegoodnessofthisoverallfit,whichismeasuredbythecoefficientofdeterminationorrsquared.IfwedividebothsidesbytheTSS,weget:1=(ESS/TSS)+(RSS)/(TSS)
Rsquared: ThevalueofR2liesbetween0and1,thatis0≤R2≤1IfESS=TSS,thenR2=1Thisimpliesaperfectfit.R2isameasureofthegoodnessoffitoftheestimatedregressionmodeltothesampledata,andthebetterthisfitis,thecloserisR2toavalueof1IftheESS=0,thenR2=0-ThissuggeststhattheregressionmodeldoesnotprovidebetterestimatesforYi.-Moreimportantly,afindingthatR2=0suggeststhatalinearrelationshipdoesnotexistbetweenYiandtheregressors(xvariables).-Thisimpliesthatanon-linearrelationshipexistsbetweenYiandtheregressors.-WhethertheobtainedR2ishighenoughdependsontheactualproblemstudiedandthetypeofdatathathasbeenused.-Inthecaseofcross-sectionaldata,R2≥0.5isusuallyconsideredtobeagoodenoughfit.Itislowerbecauseitdoesnottakeintoaccountthenatureoffluctuationsoveratimeseries;ratheritisasnapshotintime.ThisiswhytimeseriesdatarequiresahigherRsquared.-Intime-seriesdata,R2≥0.9isusuallyconsideredtobeagoodenoughfit.
111 XˆY-Y β=
111 uY-Y =
2n
1iii
2n
1ii
2n
1ii )Y-Y()Y-Y()Y-Y( ∑∑∑
===+= ∑
∑
∑
∑
=
=
=
= == n
1i
2i
n
1i
2i
n
1i
2i
n
1i
2i
)Y-Y(
u-1
)Y-(Y
)Y-Y(
∑∑ ∑n
1i
2i
2i
n
1i
n
1i
21
2i uxˆy
== =
+β=
2n
1iii
2n
1ii
2n
1ii )Y-Y()Y-Y()Y-Y( ∑∑∑
===+=
ThenumberofregressorsdoesnotaffecttheTSS;itisfixedforagivensample.TheRSSdecreases,ordoesnotincrease,asmoreandmoreregressorsareincludedinthemodel.R2isanincreasingfunctionofk,oratleast,isnon-decreasingink.Thatis,R2onlyincreaseswhentheadditionofanotherindependentvariablehelpsexplainthevariationinthedependentvariable.ThevalueofR2isconstrainedbythesamplesize.But,kcannotbeincreasedindefinitely,becausedf=n-(k+1)≥0TheOLSestimatorsdonotexistifdf<0,andthelargerthedf,themorereliabletheestimatedmodel.Aquestionthatiscommonlyaskedis:‘shouldweincludeaconstant’?Ifwedon’t,thenweareforcingthemodeltopassthroughtheorigina.k.a.zeroonboththeyandx-axis(inthebottomleftmostcorner).Ineffect,weareimposingtherestrictionthatβ0=0whichtheoretically,thismaybeincorrect.Forinstance,inestimatingaconsumptionfunction,imposingβ0maybeinvalid,becauseconsumptionexpenditureisstillundertakenthroughsavingsandothermeasuresofwealth,andnotjustdirectlythroughincome(peoplestillconsumewheretheirincomeiszero).Weshouldalmostalwaysincludeβ0inourestimatedmodel.Evenifβ0fromtheestimatedmodelisveryclosetozeroinmagnitude,andisstatisticallyinsignificant,itisgoodtoleaveitin.Ifwedon’t,weareforcingthemodeltogothroughtheorigin.Donotexcludeβ0fromtheestimatedmodelunlessyouhavesomestrongtheoreticalreasontodosowhichisrare.Chapter3:MultipleRegression,estimationAsimplelinearregressionmodelisquitelimitedinhowwetrytoexplainonlyonevariableintermsofanother.Incontrast,amultiplelinearregressionmodeldealswithseveralexplanatoryvariables.Weavoidtheproblemofomittedvariablebiasandincludefactorsthatwouldobviouslybeincludedintheerrorterm.Weinterpretthevariablesthesameasasimplelinearregression.Tohelpuschoosewhatvariablestoincludeinourmodelweuseacorrelationmatrix.
Doesthelittleprepresentsigma?Thismeansthatsigma,iscloseto+1thereisastrongpositiverelationship.Closeto-1meansthereisastrongnegativerelationship.+1or-1beingtheperfectrelationship.0meansthatxandyareindependent.Ifwesquaresigma,thisgivesustheproportionofthevarianceinonevariablethatcanbedirectlyrelatedtochangesintheother.
• |ρ|>0.8–strongcorrelation,withρ2≈2/3• |ρ|<0.5–weakcorrelation,withρ2=¼• Inbetween–moderatecorrelation
Thisisacorrelationmatrix
StatisticalSignificance:Probabilityvaluelessthan0.05(p-value)T-testforsignificanceLEARNTHISFORMULA
LEARNWHATLEVELLOGLEVELLEVELANDLOGLOGRELATIONSHIPSARE.Mustknowtheseformulas:PRF=
SRF=
OLS= Whenworkingwithamultipleregressionthet-statis
Multicollinearityiswhenseveralindependentvariablesarelinearlyrelated.Thiscanalterresultsdrastically.2typescanoccurthatincludeexactandnear.Iftwoormoreindependentvariableshavealinearrelationshipbetweenthem,thenwehaveperfectorexactmulticollinearity.ThereisnouniquesolutiontothenormalequationsobtainedbyOLS.Whenthereisalinearrelationshipamongthevariables,thevariance-covariancevaluesbecomeundefined.Itisnotpossibletoestimatetheregressioncoefficientsinthecaseofexactmulticollinearity.Mostcomputerprogramswillhighlightthisbydisplayinganerrormessage,suchas‘matrixsingular’,‘nearsingularmatrix’,or‘exactcollinearityencountered’Inmostpracticalsituations,thereisaclosebutnotexactrelationshipbetweentheXvariablesinthemodel.Inthisinstance,weobtainactualparameterestimates.Estimationpackagesdon’ttellusthatwehaveahighlycollinearrelationshipamongtheindependentvariables.Whentheseestimateshaveverylargevariances,itmeansthattheirvaluesvarywidelyfromonesampletoanother.Thismeansthatwhenanewsetofobservationsbecomesavailableattheendofthemonthortheendofthequarter,whenweusethesenewvaluestore-estimateourmodel,wemayobtainestimatesthatdiffersignificantlyfromtheoneswearenowusing.Ageneralruleofthumbthat’susedtoindicatewhetherornotmulticollinearityisaproblemisifR21▪≥0.90,whichimpliesthattheVIF≥10.
Whywedon’tignoremulticollinearity?
First,iftwoormoreexplanatoryvariablesareexactlylinearlyrelated,thenthemodelcannotbeestimated.Second,ifsomeexplanatoryvariablesarenearlylinearlyrelated,thentheOLSestimatorsarestillBLUE,buthaveverylargevariancesandcovariance’s,makingpreciseestimationdifficult.
y X u= ʹ +β
! !y X= ʹβ
! ( )β = ʹ ʹ−X X X y1
tVar SE
n kk k
k
k k
k− − =
−=
−1
!
(! )
!
(! )β β
β
β β
β
Third,nearmulticollinearityincreasesthestandarderrorsoftheregressioncoefficientsandreducesthet-statistics,makingcoefficientslesssignificantandpossiblyeveninsignificant.Fourth,thecovariancebetweentheregressioncoefficientsofapairofhighlycorrelatedvariableswillbeveryhigh,inabsoluteterms,makingitdifficulttointerpretindividualcoefficients.
MCisaproblemwiththedatanotthemodel.Itcomesupwithahighrsquaredandlowtstatvalues.Highvaluesforcorrelationcoefficientsbetweentheindependentvariables.Thisisdifferenttoahighcorrelationcoefficientwiththedependentandindependentvariables.Regressioncoefficientsappeartobesensitivetomodelspecification.Notethatmulticollinearitymaystillbepresentevenifapairwisecorrelationcoefficientmaybequitelow.Thisisbecause3ormorevariablesmaybelinearlyrelated;yetpairwisecorrelationcoefficientsmaynotbehigh.Insuchinstances,multicollinearityisobservedwhenregressioncoefficientsaredrasticallyaltered,evenpossiblyreversingsigns,whenvariablesareaddedordropped.TosolveMC:-Eliminatevariables(butthiscanleadtoomittedvariablebias)-IncreasesamplesizeWhenyouhaveamultipleregressionmodel,addinganewindependentvariable,nomatterhowirrelevantorridiculousitis,eitherleavestheR2thesame,orincreasesitsvalue.Thus,weareunabletouseR2tocomparetwomodelswithdifferentnumbersofindependentvariables.WemustthenusetheadjustedRsquared.
Thedegreesoffreedomforthesumsofsquaresare:
TSS:n-1 ESS:(k+1)-1 RSS:n–(k+1)
and,dfTSS=dfESS+dfRSS,wherekrepresentsthenumberofregressors,and(k+1)isthenumberofparameterstobeestimated,whichincludestheconstantterm.
Althoughahigherisusuallypreferredtoalowerone,oneshouldneverattempttomaximisethevalueoftheadjustedR2attheexpenseofeconomictheory.
)R-1(1-k-n
1-n-1
TSSRSS
1-k-n
1-n-1
)1-n/(TSS)1-k-n/(RSS
-1df/TSSdf/RSS
-1R
2
TSS
RSS2
==
==