gauss-markov assumptions, full ideal conditions of ols · correlation/ autocorrelation (panel and...

73
Gauss-Markov Assumptions, Full Ideal Conditions of OLS The full ideal conditions consist of a collection of assumptions about the true regression model and the data generating process and can be thought of as a description of an ideal data set. Ideal conditions have to be met in order for OLS to be a good estimate (BLUE, unbiased and efficient) Most real data do not satisfy these conditions, since they are not generated by an ideal experiment. However, the linear regression model under full ideal conditions can be thought of as being the benchmark case with which other models assuming a more realistic DGP should be compared. One has to be aware of ideal conditions and their violation to be able to control for deviations from these conditions and render results unbiased or at least consistent: 1. Linearity in parameters alpha and beta: the DV is a linear function of a set of IV and a random error component → Problems: non-linearity, wrong determinants, wrong estimates; a relationship that is actually there can not be detected with a linear model 2. The expected value of the error term is zero for all observations → Problem: intercept is biased i E 0

Upload: nguyentruc

Post on 26-Jun-2018

239 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Gauss-Markov Assumptions, Full IdealConditions of OLS

The full ideal conditions consist of a collection of assumptions about the trueregression model and the data generating process and can be thought ofas a description of an ideal data set. Ideal conditions have to be met inorder for OLS to be a good estimate (BLUE, unbiased and efficient)

Most real data do not satisfy these conditions, since they are not generated byan ideal experiment. However, the linear regression model under full idealconditions can be thought of as being the benchmark case with whichother models assuming a more realistic DGP should be compared.

One has to be aware of ideal conditions and their violation to be able to controlfor deviations from these conditions and render results unbiased or at leastconsistent:

1. Linearity in parameters alpha and beta: the DV is a linear function of a setof IV and a random error component

→ Problems: non-linearity, wrong determinants, wrong estimates; a relationship that is actually there can not be detected with a linear model

2. The expected value of the error term is zero for all observations

→ Problem: intercept is biased

iE 0

Page 2: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

3. Homoskedasticity: The conditional variance of the error term isconstant in all x and over time: the error variance is a measure ofmodel uncertainty. Homoskedasticity implies that the modeluncertainty is identical across observations.

→ Problem: heteroskedasticity – variance of error term is different across observations – model uncertainty varies from observationto observation – often a problem in cross-sectional data, omittedvariables bias

4. Error term is independently distributed and not correlated, nocorrelation between observations of the DV.

→ Problem: spatial correlation (panel and cross-sectional data), serial correlation/ autocorrelation (panel and time-series data)

2i iV E ² cons tan t

i j i jCov , E 0, i j

Page 3: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

5. Xi is deterministic: x is uncorrelated with the error term since xi isdeterministic:

→ Problems: omitted variable bias, endogeneity and simultaneity

6. Other problems: measurement errors, multicolinearity

If all Gauss-Markov assumptions are met than the OLS estimators alphaand beta are BLUE – best linear unbiased estimators:

best: variance of the OLS estimator is minimal, smaller than thevariance of any other estimator

linear: if the relationship is not linear – OLS is not applicable.

unbiased: the expected values of the estimated beta and alphaequal the true values describing the relationship between x and y.

i i i i i i

i i i i i

Cov X , E X E X *E

X E X E sin ce X is det

0

Page 4: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Inference

Is it possible to generalize the regression results for the sample underobservation to the universe of cases (the population)?

Can you draw conclusions for individuals, countries, time-points beyondthose observations in your data-set?

• Significance tests are designed to answer exactly these questions.

• If a coefficient is significant (p-value<0.10, 0.05, 0.01) then you candraw conclusions for observations beyond the sample underobservation.

But…

• Only in case the samples matches the characteristics of thepopulation

• This is normally the case if all (Gauss-Markov) assumptions of OLSregressions are met by the data under observation.

• If this is not the case the standard errors of the coefficients might bebiased and therefore the result of the significance test might bewrong as well leading to false conclusions.

Page 5: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Significance test: the t-test

22 ,

*N Var X

Page 6: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

The t-test:• T-test for significance: testing the H0 (Null-Hypothesis) that beta

equals zero: H0: beta=0; HA: beta≠0

• The test statistic follows a student t distribution under the Null

• t is the critical value of a t – distribution for a specific number ofobservations and a specific level of significance: convention instatistics is a significance level of 5% (2.5% on each side of the t-distribution for a 2-sided test) – this is also called the p-value.

2

2

ˆ ˆ

ˆ

*

ˆ ˆ

ˆ

*

n

n

r rt

SSRSE

N Var X

tSSRSE

N Var X

Page 7: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Assume beta is 1 and the estimated standard error is 0.8The critical value of the two-sided symmetric student t-distribution for n=∞ and alpha=5% is 1.96

Acceptance at the 5% level:

The Null (no significant relationship) will not be rejected if:

This condition can be expressed in terms of betaby substituting for t:

Multiplying through by the SE of beta:

Then:

Substituting beta=1 and se(beta)=0.8:

Since this in-equality holds true, the null-hypothesis is not rejected. Thus, weaccept that there is rather no relationship between x and y and beta equalswith a high probability (95%) zero.

ˆ 0

ˆt

SE

1.96 1.96t

ˆ 0

1.96 1.96ˆSE

ˆ ˆ ˆ1.96* 0 1.96*SE SE

ˆ ˆ ˆ0 1.96* 0 1.96*SE SE

0 1.96*0.8 1 0 1.96*0.8

1.568 1 1.568

Page 8: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

acceptance region

0

beta1-1.568 1.568

probability density

function of beta

2.5%2.5%

Page 9: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Now assume that the standard error of beta is 0.4 instead of 0.8, weget:

This in-equality is wrong, therefore we reject the null-hypothesis thatbeta equals zero and decide in favour of the alternative hypothesisthat there is a significant positive relationship between x and y.

0 1.96*0.4 1 0 1.96*0.4

0.784 1 0.784

acceptance region0

beta1-0.748 0.748

probability density

function of beta

2.5%2.5%

Page 10: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Significance test – rule of thumb:

If the regression-coefficient (beta) is at leasttwice as large as the correspondingstandard error of beta the result isstatistically significant at the 5% level.

Page 11: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Power of a test

For a given test statistic and a critical region of a givensignificance level we define the probability of rejecting the nullhypothesis as the power of a test

The power would be optimal if the probability of rejecting the nullwould be 0 if there is a relationship and 1 otherwise.

This is, however, not the case in reality. There is always apositive probability to draw the wrong conclusions from theresults:

• One can reject the null-hypothesis even though it is true (type 1error, alpha error):alpha=Pr[Type I Error]=Pr[rejecting the H0 | H0 is true]

• Or not reject the null-hypothesis even though it is wrong (type 2error, beta error)beta=Pr[Type II Error]=Pr[accepting the H0 | Ha is true]

Page 12: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Type I and Type II errors

Alpha and beta errors: an example:

A criminal trial: taking as the null hypothesis that the defendant isinnocent, type I error occurs when the jury wrongly decides that thedefendant is guilty. A type two error occurs when the jury wronglyacquits the defendant.

In significance test:

H0: beta is insignificant = 0:

Type I error: wrongly rejecting the null hypothesis

Type II error: wrongly accepting the null that the coefficient is zero.

Selection of significance levels increase or decrease the probability oftype I and type II errors.

The smaller the significance level (5%, 1%) the lower the probability oftype I and the higher the probability of type II errors.

Page 13: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Confidence Intervals

Significance tests assume that hypotheses come beforethe test: beta≠0. however, the significance test leaves us with some vacuum since we know that beta is differentfrom zero but since we have a probabilistic theory we arenot sure what the exact value should be.

Confidence intervals give us a range of numbers that areplausible and are compatible with the hypothesis.

As for significance test the researcher has to choose thelevel of confidence (95% is convention)

Using the same example again: estimated beta is 1 and theSE(beta) is 0.4 ; the critical value of the two-sided t-distribution are 1.96 and -1.96

Page 14: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Calculation of the confidence interval:The question is how far can a hypothetical value differ from the

estimated result before they become incompatible with theestimated value?

The regression coefficient b and the hypothetical value beta areincompatible if either

That is if beta satisfies the double inequality:

Any hypothetical value of beta that satisfies this inequality will thereforeautomatically be compatible with the estimate b, that is will not berejected. The set of all such values, given by the interval betweenthe lower and upper limits of the inequality, is known as theconfidence interval for b. The centre of the confidence interval is theestimated b.

If the 5% significance level is adopted the corresponding confidenceinterval is known as the 95% confidence interval (1% - 99%).

crit crit

b bt or t

SE b SE b

* *crit critb SE b t b SE b t

Page 15: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Since the critical value of the t distribution is greater for the 1% levelthan for the 5% level, for any given number of degrees of freedom, itfollows that the 99% interval is wider than the 95% interval andencompasses al the hypothetical values of beta in the 95%confidence interval plus some more on either side

Example: b=1, se(b)=0.4, 95% confidence interval, t_critical=1.96,

-1.96:

1-0.4*1.96 ≤ beta ≤ 1+0.4*1.96

95% confidence interval:

0.216 ≤ beta ≤ 1.784

Thus, all values between 0.216 and 1.784 are theoretically possibleand would not be rejected. They are compatible to the estimated b =1. 1 is the central value of the confidence interval.

Page 16: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Interpretation of regression results:reg y x

Source | SS df MS Number of obs = 100-------------+---------------------------------------------- F( 1, 98) = 89.78

Model | 1248.96129 1 1248.96129 Prob > F = 0.0000Residual | 1363.2539 98 13.9107541 R-squared = 0.4781

-------------+---------------------------------------------- Adj R-squared = 0.4728Total | 2612.21519 99 26.386012 Root MSE = 3.7297

----------------------------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+-------------------------------------------------------------------------------------x | 1.941914 .2049419 9.48 0.000 1.535213 2.348614

_cons | .8609647 .4127188 2.09 0.040 .0419377 1.679992----------------------------------------------------------------------------------------------------

Degrees of freedom: number of observations minus number of estimatedparameters: in this case alpha and beta: 100-2=98. If we had 2 explanatoryvariables the number of degrees of freedom would decrease to 97, 3 – 96,etc.

The concept of DoF implies that you cannot have more explanatory variablesthan observations!

Page 17: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

DefinitionsTotal Sum of Squares (SST):

Explained (Estimation) Sum of Squares (SSE):

Residual Sum of Squares or Sum of Squares Residuals (SSR):

2

1

n

ii

SST y y

22

1 1

ˆn n

i i ii i

SSR y x

2

1

ˆn

ii

SSE y y

Page 18: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Goodness of Fit

How well does the explanatory variable explain the dependentvariable?

How well does the regression line fit the data?

The R-squared (coefficient of determination) measures how muchvariation of the dependent variable can be explained by theexplanatory variables.

The R² is the ratio of the explained variation compared to the totalvariation: it is interpreted as the fraction of the sample variation in ythat is explained by x.

Explained variation of y / total variation of y:

2

2 1

2

1

ˆ ˆ( )

1

( )

n

in

i

Y YSSE SSR

RSST SST

Y Y

Page 19: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Properties of R²:

• 0 ≤ R² ≤ 1, often the R² is multiplied by 100 to get the percentage of the sample variation in y that is explained by x

• If the data points all lie on the same line, OLS provides a perfect fitto the data. In this case the R² equals 1 or 100%.

• A value of R² that is nearly equal to zero indicates a poor fit of theOLS line: very little of the variation in the y is captured by thevariation in the y_hat (which all lie on the regression line)

• R²=(corr(y,yhat))²• The R² follows a complex distribution which depends on the

explanatory variable• Adding further explanatory variables leads to an increase the R²• The R² can have a reasonable size in spurious regressions if the

regressors are non-stationary• Linear transformations of the regression model change the value of

the R² coefficient• The R² is not bounded between 0 and 1 in models without intercept

Page 20: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Properties of an Estimator1. Finite Sample Properties

There are often more than 1 possible estimators to estimate arelationship between x and y (e.g. OLS or Maximum Likelihood)

How do we choose between two estimators: the 2 mostly usedselection criteria are bias and efficiency.

Bias and efficiency are finite sample properties, because they describehow an estimator behaves when we only have a finite sample (eventhough the sample might be large)

In comparison so called “asymptotic properties” of an estimator haveto do with the behaviour of estimators as the sample size growswithout bound

Since we always deal with finite samples and it is hard to say whetherasymptotic properties translate to finite samples, examining thebehaviour of estimators in finite samples seems to be moreimportant.

Page 21: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Unbiasedness

UnBiasedness: the estimated coefficient is on average true:

That is: in repeated samples of size n the mean outcome of the estimate equalsthe true – but unknown – value of the parameter to be estimated.

If an estimator is unbiased, then its probability distribution has an expectedvalue equal to the parameter it is supposed to be estimating. Unbiasednessdoes not mean that the estimate we get with any particular sample is equalto the true parameter or even close. Rather the mean of all estimates frominfinitely drawn random samples equals the true parameter.

ˆE 0

0.5

11

.52

2.5

33

.5D

ensi

ty

.5 .6 .7 .8 .9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1b1

Page 22: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Sampling Variance of an estimator

Efficiency: is a relative measure between two estimators – measuresthe sampling variance of an estimator: V(beta)

Let and be two unbiased estimator of the true parameter . Withvariances and . Then is called to be relative moreefficient than if is smaller than .

The property of relative efficiency only helps us to rank two unbiasedestimators.

0.0

2.0

4.0

6.0

8D

ensity

-20 -10 0 10 20b1

0.1

.2.3

.4D

ensity

-4 -2 0 2 4b2

ˆV

V

ˆV V

Page 23: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Trade-off between Bias and EfficiencyWith real world data and the related problems we sometimes have only

the choice between a biased but efficient and an unbiased butinefficient estimator. Then another criterion can be used to choosebetween the two estimators, the root mean squared error (RMSE).The RMSE is a combination of bias and efficiency and gives us ameasure of overall performance of an estimator.

RMSE:

k measures the numberof experiments, trials orsimulations0

.51

1.5

22

.5

Den

sity

-.5 0 .5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

coefficient of z3

xtfevd

fixed effects

K 2

truek 1

2

true

2

true

1ˆ ˆRMSEK

ˆMSE E

ˆ ˆMSE Var Bias ,

Page 24: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Asymptotic Properties of Estimators

We can rule out certain silly estimators by studying the asymptotic orlarge sample properties of estimators.

We can say something about estimators that are biased and whosevariances are not easily found.

Asymptotic analysis involves approximating the features of the samplingdistribution of an estimator.

Consistency: how far is the estimator likely to be from the trueparameter as we let the sample size increase indefinitely.

If N→∞ the estimated beta equals the true beta:

Unlike unbiasedness, consistency involves that the variance of theestimator collapses around the true value as N approaches infinity.

Thus unbiased estimators are not necessarily consistent, but thosewhose variance shrink to zero as the sample size grows areconsistent.

n nn

nn

ˆ ˆlim Pr 0, p lim ,

ˆlim E

Page 25: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Multiple Regressions

• In most cases the dependent variable y is not just afunction of a single explanatory variable but acombination of several explanatory variables.

• THUS: drawback of binary regression: impossible to drawceteris paribus conclusions about how x affects y(omitted variable bias).

• Models with k-independent variables:

• Control for omitted variable bias

• But: increases inefficiency of the regression sinceexplanatory variables might be collinear.

i 1 i1 2 i2 k ik iy x x ... x

Page 26: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Obtaining OLS Estimates in MultipleRegressions

The intercept is the predicted value of y when all explanatory variablesequal zero.

The estimated betas have partial effect or ceteris paribus interpretations.

We can obtain the predicted change in y given the changes in each x.when x_2 is held fixed then beta_1 gives the change in y if x_1changes by one unit.

0 1 1 2 2

2

2 2 1 1 1 1 2 2 2 21 1 1 1

1 22 2

1 1 2 2 1 1 2 21 1 1

1

ˆ

ˆ ' '

i i i i

n n n n

i i i i i i ii i i i

n n n

i i i ii i i

y x x

x x x x y y x x x x x x y y

x x x x x x x x

X X X y

Page 27: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

“Holding Other Factors Fixed”

• The power of multiple regression analysis is that itprovides a ceteris paribus interpretation even though thedata have not been collected in a ceteris paribus fashion.

• Example: multiple regression coefficients tell us whateffect an additional year of education has on personalincome if we hold social background, intelligence, sex,number of children, marital status and all other factorsconstant that also influence personal income.

Page 28: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Standard Error and Significance in MultipleRegressions

22 2

1 211 1

2

1 1 11

2

1 12 2 1

1 1 2 12

1 11

1 12

1 1

1ˆ ˆˆ11

ˆ

:

ˆ ˆ

1

n

ii

n

ii

n

ii

i i n

ii

Varn kSST R

SST x x

x xSSE

R for the regression of x on x RSST

x x

SD SESST R

Page 29: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

F – Test: Testing Multiple Linear Restrictions

• t-test (as significance test) is associated with any OLS coefficient.

• We also want to test multiple hypotheses about the underlyingparameters beta_0…beta_k.

• The F-test, tests multiple restriction: e.g. all coefficients jointly equalzero:

H0: beta_0=beta_1=…=beta_k=0

Ha: H0 is not true, thus at least one beta differs from zero

• The F-statistic (or F-ratio) is defined as:

• The F-statistic is F distributed under the

Null-Hypothesis.

• F-test for overall significance of a regression: H0: all coefficients arejointly zero – in this case we can also compute the F-statistic by usingthe R² of the Regression:

, 1

/

/ 1r ur

ur

q n k

SSR SSR qF

SSR n k

F F

² /

1 ² / 1

R k

R n k

Page 30: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

• SSR_r: Sum of Squared Residuals of the restricted model (constantonly)

• SSR_ur: Sum of Squared Residuals of the unrestricted model (allregressors)

• SSR_r can be never smaller than SSR_ur F is always non-negative

• k – number of explanatory variables (regressors), n – number ofobservations, q – number of exclusion restrictions (q of the variableshave zero coefficients): q = df_r – df_ur (difference in degrees offreedom between the restricted and unrestricted models; df_r >df_ur)

• The F-test is a one sided test, since the F-statistic is always non-negative

• We reject the Null at a given significance level if F>F_critical for thissignificance level.

• If H0 is rejected than we say that all explanatory variables are jointlystatistically significant at the chosen significance level.

• THUS: The F-test only allows to not reject H0 if all t-tests for allsingle variables are insignificant too.

Page 31: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Goodness of Fit in multiple Regressions:

As with simple binary regressions we can define SST, SSE and SSR.And we can calculate the R² in the same way.

BUT: R² never decreases but tends to increase with the number ofexplanatory variables.

THUS, R² is a poor tool for deciding whether one variable of severalvariables should be added to a model.

We want to know whether a variable has a nonzero partial effect on y inthe population.

Adjusted R²: takes the number of explanatory variables into accountsince the R² increases with the number of regressors:

k is the number of explanatory variables and n the number ofobservations

2 211 1adj

nR R

n k

Page 32: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Comparing Coefficients

The size of the slope parameters depends on the scaling of thevariables (on which scale a variables is measured), e.g. populationin thousands or in millions etc.

To be able to compare the size effects of different explanatoryvariables in a multiple regression we can use standardizedcoefficients:

Standardized coefficients take the standard deviation of the dependentand explanatory variables into account. So they describe how muchy changes if x changes by one standard deviation instead of oneunit. If x changes by 1 SD – y changes by b_hat SD. This makes thescale of the regressors irrelevant and we can compare themagnitude of the effects of different explanatory variables (thevariables with the largest standardized coefficient is most important

in explaining changes in the dependent variable).

ˆˆˆ 1,...,

ˆjx j

j

y

b for j k

Page 33: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Problems in Multiple Regressions:1. Multicolinearity

• Perfect multicolinearity leads to drop out of one of the variables: if x1and x2 are perfectly correlated (correlation of 1) – the statisticalprogram at hand does the job.

• The higher the correlation the larger the population variance of thecoefficients, the less efficient the estimation and the higher theprobability to get erratic point estimates. Multicolinearity can result innumerically unstable estimates of the regression coefficients (smallchanges in X can result in large changes to the estimated regressioncoefficients).

• Trade off between omitted variable bias and inefficiency due tomulticolinearity.

Page 34: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Testing for MulticolinearityCorrelation between explanatory variables: Pairwise colinearity can be

determined from viewing a correlation matrix of the independentvariables. However, correlation matrices will not reveal higher ordercolinearity.

Variance Inflation Factor (vif): measures the impact of collinearity amongthe x in a regression model on the precision of estimation. vif detectshigher order multicolinearity: one or more x is/are close to a linearcombination of the other x.

• Variance inflation factors are a scaled version of the multiplecorrelation coefficient between variable j and the rest of theindependent variables. Specifically,

where Rj is the multiple correlation coefficient.• Variance inflation factors are often given as the reciprocal of the

above formula. In this case, they are referred to as the tolerances.• If Rj equals zero (i.e., no correlation between Xj and the remaining

independent variables), then VIFj equals 1. This is the minimumvalue. Neter, Wasserman, and Kutner (1990) recommend looking atthe largest VIF value. A value greater than 10 is an indication ofpotential multicolinearity problems.

2

1

1j

j

VIFR

Page 35: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Possible Solutions

• Reduce the overall error: by including explanatoryvariables not correlated with other variables butexplaining the dependent variable

• Drop variables which are highly multi-collinear

• Increase the variance by increasing the number ofobservations

• Increase the variance of the explanatory variables

• If variables are conceptually similar – combine them intoa single index, e.g. by factor or principal componentanalysis

Page 36: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

2. Omitted Variable Bias• The effect of omitted variables that ought to be included:

• Suppose the dependent variable y depends on two explanatoryvariables:

• But you are unaware of the importance of x2 and only include x

• If x2 is omitted from the regression equation, x1 will have a“double” effect on y (a direct effect and one mimicking x2)

• The mimicking effect depends on the ability of x1 to mimic x2 (thecorrelation) and how much x2 would explain y

• Beta1 in the second equation is biased upwards in case x1 and x2are positively correlated and downward biased otherwise

• Beta1 is only unbiased if x1 and x2 are not related (corr(x1,x2)=0

• However: including variable that is unnecessary, because it doesnot explain an variation in y the regression becomes inefficient andthe reliability of point estimates decreases.

0 1 1 2 2 i i i iy x x

0 1 1 i i iy x

Page 37: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Testing for Omitted Variables

• Heteroskedasticity of the error term with respect to the observationof a specific independent variable is a good indication for omittedvariable bias:

• Plot the error term against all explanatory variables

• Ramsey RESET F-test for omitted variables in the whole model:tests for wrong functional form (if e.g. an interaction term is omitted):

– Regress Y on the X’s and keep the fitted value Y_hat ;

– Regress Y on the X’s, and Y_hat² and Y_hat³.

– Test the significance of the fitted value terms using an F test.

• Szroeter test for monotonic variance of the error term in theexplanatory variables

Solutions:

• Include variables that are theoretically important and have a highprobability of being correlated with one or more variables in themodel and explaining significant parts of the variance in the DV.

• Fixed unit effects for unobserved unit heterogeneity (time invariantunmeasurable characteristics of e.g. countries – culture, institutions)

Page 38: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

3. Heteroskedasticity

The variance of the error term is not constant in each observation butdependent on unobserved effects, not controlling for this problemviolates one of the basic assumptions of linear regressions andrenders the estimation results inefficient.

Possible causes:• Omitted variables, for example: spending might vary with the

economic size of a country, but size is not included in the model.Test:• plot the error term against all independent variables• White test if the form of Heteroskedasticity is unknown• Breusch-Pagan Lagrange Multiplier test if the form is known

Solutions:• Robust Huber-White sandwich estimator (GLS)• White Heteroskedasticity consistent VC estimate: manipulates the

variance-covariance matrix of the error term.• More substantially: include omitted variables• Dummies for groups of individuals or countries that are assumed to

behave more similar than others

Page 39: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Tests for Heteroskedasticity:a. Breusch-Pagan LM test for known form of Heteroskedasticity:

groupwise

=sum of group-specific squared residuals

= OLS residuals

H0: homoskedasticity ~ Chi² with n-1 degrees of freedom

LM-test assumes normality of residuals, not appropriate ifassumption not met.

b. Likelihood Ratio Statistic

Residuals are computed using MLE (e.g. iterated FGLS, OLS lossof power)

2

22

1

12

ni

i

sTLM

s

2is2s

2 2 22ln ln ln ~ 1iNT T dF n

Page 40: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

c. White test if form of Heteroskedasticity is unknown:

• H0:

• Ha:

1. Estimate the model under H0

2. Compute squared residuals:

3. Use squared residuals as dependent variable of auxiliaryregression: RHS: all regressors, their quadratic forms andinteraction terms

4. Compute White statistic from R² of auxiliary regression:

5. Use one-sided test and check if n*R² is larger than 95%quantile of Chi²-distribution

2|i iV x

2| ii iV x

2ie

2 2 20 1 2 1 1 2 1 2 3... ...i i k ik k i k i i q ik ie x x x x x x

2 2( )* aqn R

Page 41: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Robust White Heteroskedasticity Consistent Variance-Covariance Estimator:

Normal Variance of beta:

Robust White VC matrix:

D is a n*n matrix with off-diagonals=0 and diagonal the squaredresiduals.

The normal variance covariance matrix is weighted by the non-constanterror variance.

Robust Standard errors therefore tend to be larger.

22 ,

*N Var X

1 1

2

1ˆˆ ˆ

ˆ ˆ' ' '

ˆi

Vn

n X X X DX X X

D diag e

Page 42: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Generalized Least Squares Approaches

• The structure of the variance covariance matrix Omega is used not just toadjust the standard errors but also the estimated coefficient.

• GLS can be an econometric solution to many violations of the G-M conditions(Autocorrelation, Heteroskedasticity, Spatial Correlation…), since the OmegaMatrix can be flexibly specified

• Since the Omega matrix is not known, it has to be estimated and GLSbecomes FGLS (Feasible Generalized Least Squares)

• All FGLS approaches are problematic if number of observations is limited –very inefficient, since the Omega matrix has to be estimated

Beta:

Estimated covariance matrix:

1N N1 1

i i i ii 1 i 1

X ' X X ' y

1N N1 1

i i i ii 1 i 1

ˆ ˆ ˆX ' X X ' y

1

1X ' X

Page 43: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Omega matrix with heteroscedastic error structure andcontemporaneously correlated errors, but in principleFGLS can handle all different correlation structures…:

21 21 31 n1

212 2 32 n 2

213 23 3 n3

21n 2n 3n n

Page 44: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

4. Autocorrelation

The observation of the residual in t1 is dependent on theobservation in t0: not controlling for autocorrelationviolates on of the basic assumptions of OLS and maybias the estimation of the beta coefficients

Options:• lagged dependent variable

• differencing the dependent variable

• differencing all variables

• Prais-Winston Transformation of the data

• HAC constitent VC matrix

Tests:• Durbin-Watson, Durbin’s m, Breusch-Godfrey test

• Regress e on lag(e)

i t i t 1 it

Page 45: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

AutocorrelationThe error term in t1 is dependent on the error term in t0: not controlling

for autocorrelation violates on of the basic assumptions of OLS andmay bias the estimation of the beta coefficients

The residual of a regression model picks up the influences of thosevariables affecting the DV that have not been included in theregression equation. Thus, persistence in excluded variables is themost frequent cause of autocorrelation.

Autocorrelation does make no predictions about a trend, though a trendin the DV is often a sign for serial correlation.

Positive autocorrelation: rho is positive: it is more likely that a positivevalue of the error-term is followed by a one and a negative by anegative one.

Negative autocorrelation: rho is negative: it is more likely that a positivevalue of the error-term is followed by a negative one and vice versa.

i t i t 1 it

Page 46: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

DW test for first order AC:

• Regression must have an intercept

• Explanatory variables have to be deterministic

• Inclusion of LDV biases statistic towards 2

Efficiency problem of serial correlation can be fixed by Newey-WestHAC consistent VC matrix for Heteroskedasticity of unknown formand AC of order p: Problem: VC matrix consistent but coefficient canstill be biased! (HAC is possible with “ivreg2” in stata)

2

12

2

1

T

t tt

T

tt

e e

d

e

1 1*

* 2 ' ' '1

1 1 1

ˆˆ ' '

1 1

11

1

NW

pT T

t t t l t t t t l t l tt t t l

l

V T X X S X X

S e x x e e x x x xT T

p

Page 47: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

OR a simpler Test:

• Estimate the model by OLS

• compute the residuals

• Regress the residuals on all independent variables(including the LDV if present) and the lagged residuals

• If the coefficient on the lagged residual is significant (withthe usual t-test), we can reject the null of independenterrors.

Page 48: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Lagged Dependent Variable

• The interpretation of the LDV as measure of time-persistency ismissleading

• LDV captures average dynamic effect, this can be shown byCochrane-Orcutt distributive lag models. Thus LDV assumes that allx-variables have an one period lagged effect on y

make sure interpretation is correct – calculating the real effect of x -variables

• Is an insignificant coefficient really insignificant if coefficient of laggedy is highly significant?

it 0 it 1 k it i ty y x

0 , 1 1

0 , 1

1

( )

it i t it it

it i t it

it

y y x

y y

x

Page 49: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

First Difference models

• Differencing only the dependent variable – only if theorypredicts effects of levels on changes

• FD estimator assumes that the coefficient of the LDV isexactly 1 – this is often not true

• Theory predicts effects of changes on changes

• Suggested remedy if time series is non-stationary (has asingle unit root), asymptotic analysis for T→ ∞.

• Consistent

K

i t i t 1 k k i t k i t 1 i t i t 1k 1

K

i t k k i t i tk 1

y y x x

y x

Page 50: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Prais-Winsten Transformation• Models the serial correlation in the error term –

regression results for X variables are more straightforwardly interpretable:

with

The are iid – with

• The VC matrix of the error term is

• The matrix is stacked for N units. Diagonals are 1.

• Prais-Winston is estimated by GLS. It is derived fromthe AR(1) model for the error term. The first observationis preserved

i t i t i ty x i t i t 1 it

it 2N 0,2 T 1

T 2

2 T 32

T 1 T 2 T 3

1

11

11

1

Page 51: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

1. Estimation of a standard linear regression:

2. An estimate of the correlation in the residuals is then obtained by thefollowing auxiliary regression:

3. A Cochrane-Orcutt transformation is applied for observationst=2,…,n

4. And the transformation for t=1 is as follows:

5. With Iterating to convergence, the whole process is repeated untilthe change in the estimate of rho is within a specified tolerance, thenew estimates are used to produce fitted values for y and rho is re-estimated, by:

i t i t i ty x

i t i t 1 it

i t i t 1 i t i t 1 i ty y x x

2 2 21 1 11 y 1 x 1

i t i t i t 1 i t 1 i tˆ ˆy y y y

Page 52: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Distributed Lag Models

• Simplest form is Cochrane-Orcutt – dynamic structure ofall independent variables is captured by 1 parameter,either in the error term or as LDV

• If dynamics are that easy – LDV or Prais-Winston is fine– saves Degrees of Freedom

• Problem: if theory predicts different lags for different righthand side variables – than a miss-specified model leadsnecessarily to bias

• Test down – start with relatively large number of lags forpotential candidates:

i t i t 1 i t 1 2 i t 2 3 i t n n 1 i ty x x x x

n 1, , t 1

Page 53: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Specification Issues in Multiple Regressions:1. Non-Linearity

One or more explanatory variables have a non-linear effect on thedependent variable: estimating a linear model would lead to wrongor/and insignificant results. Thus, even though in the populationthere exist a relationship between an explanatory variable and thedependent variable, but this relationship cannot be detected due tothe strict linearity assumption of OLS

Test:

• Ramsay RESET F-test gives a first indication for the whole model

• In general, we can use acprplot to verify the linearity assumptionagainst an explanatory variable – though this is just “eye-balling”

• Theoretical expectations should guide the inclusion of squaredterms.

Page 54: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)
Page 55: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

-10

-50

51

0A

ugm

ente

dco

mpon

entp

lus

resi

dua

l

0 .2 .4 .6 .8 1institutional openness to trade standardized

-15

-10

-50

510

Au

gm

ente

dco

mpon

entp

lus

resi

dua

l

0 2 4 6 8 10level of democracy

Page 56: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

SolutionsHandy solutions without leaving the linear regression framework:• Logarithmize the IV and DV: gives you the elasticity, higher values are

weighted less (engel curve – income elasticity of demand). This model iscalled a log-log model or a log-linear model

– Different functional forms give parameter estimates that have different substantialinterpretations. The parameters of the linear model have an interpretation asmarginal effects. The elasticities will vary depending on the data. In contrast theparameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set. Thereforethe coefficients of a log-linear model can be interpreted as percentage changes –if the explanatory variable changes by one percent the dependent variablechanges by beta percent.

– The log transformation is only applicable when all the observations in the data setare positive. This can be guaranteed by using a transformation like log(X+k) wherek is a positive scalar chosen to ensure positive values. However, careful thoughthas to be given to the interpretation of the parameter estimates.

– For a given data set there may be no particular reason to assume that onefunctional form is better than the other. A model selection approach is to estimatecompeting models by OLS and choose the model with the highest R-squared.

• include an additional squared term of the IV to test for U-shape and inverseU-shape relationships. Careful with the interpretation! The size of the twocoefficients (linear and squared) determines whether there is indeed a u-shaped or inverse u-shaped relationship.

i i ilog y log log x log

2i 1 i 2 i iy x x

Page 57: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Hausken, Martin, Plümper 2004: Government Spending and Taxation inDemocracies and Autocracies, Constitutional Political Economy 15, 239-59.

Page 58: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

polity_sqr polity govcon0 0 20.049292

0.25 0.5 18.9879461 1 18.024796

2.25 1.5 17.1598424 2 16.393084

6.25 2.5 15.7245219 3 15.154153

12.25 3.5 14.68198216 4 14.308005

20.25 4.5 14.03222525 5 13.85464

30.25 5.5 13.77525136 6 13.794057

42.25 6.5 13.91105949 7 14.126257

56.25 7.5 14.4396564 8 14.851239

72.25 8.5 15.36102481 9 15.969004

90.25 9.5 16.67518100 10 17.479551

Page 59: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

The „u“ shaped relationship between democracy and government spending:

0 2 4 6 8 10

13

14

15

16

17

18

19

20

21

go

vern

me

ntco

nsu

mp

tion

in%

ofG

DP

degree of democracy

Page 60: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

2. Interaction Effects

Two explanatory variables do not only have adirect effect on the dependent variable but alsoa combined effect

Interpretation: combined effect:b1*SD(x1)+b2*SD(x2)+b3*SD(x1*x2)

Example: monetary policy of currency union hasa direct effect on monetary policy in outsidercountries but this effect is increased by importshares.

i 1 1i 2 2i 3 1i 2i iy x x x x

Page 61: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Example:

government

spending

Page 62: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

35

40

45

50

Low unemployment High unemployment

Go

ver

nm

ent

spen

din

gin

%o

fG

DP

Low cristian democraticportfolio

High cristian democraticportfolio

Page 63: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)
Page 64: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)
Page 65: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

20

30

40

50

60

70

sp

en

d

0 5 10 15 20unem

government spending trade+1sd

trade mean trade-1sd

Page 66: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

20

40

60

80

spe

nd

0 50 100 150trade

government spending unem+1sd

unem mean unem-1sd

Page 67: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Interaction Effects of Continuous Variables

-.5

0.5

11

.52

Ma

rgin

alE

ffe

cto

fU

nem

plo

yme

nt

0 20 40 60 80 100 120 140 160

Trade Openness

Marginal Effect of Unemployment on Spending

95% Confidence Interval

Dependent Variable: Government Spending

Marginal Effect of Unemployment on Spending as Trade Openness changes

Page 68: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Mean of international trade exposure

0.0

05

.01

.01

5K

ern

elD

en

sity

Estim

ate

oft

rad

e

-.5

0.5

11

.52

Ma

rgin

alE

ffe

cto

fu

ne

mp

loym

en

to

ngo

ve

rnm

en

tspe

nd

ing

0 50 100 150international trade exposure

Thick dashed lines give 95% confidence interval.Thin dashed line is a kernel density estimate of trade.

Page 69: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

3. Dummy variables

An explanatory variable that takes on only thevalues 0 and 1

Example: DV: spending, IV: whether a countryis a democracy (1) or not (0).

Alpha then is the effect for non-democraciesand alpha+beta is the effect fordemocracies.

i iy D

Page 70: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

4. Outliers

Problem:The OLS principle implies the minimization of squaredresiduals. From this follows that extreme cases can havea strong impact on the regression line.Inclusion/exclusion of extreme cases might change theresults significantly.

The slope and intercept of the least squares line is verysensitive to data points which lie far from the trueregression line. These points are called outliers, i.e.extreme values of observed variables that can distortestimates of regression coefficients.

Page 71: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Test for Outliers

• symmetry (symplot) and normality (dotplot) of dependent variable gives firstindication for outlier cases

• Residual-vs.-fitted plots (rvfplot) indicate which observations of the DV arefar away from the predicted values

• lvr2plot is the leverage against residual squared plot. The upper left cornerof the plot will be points that are high in leverage and the lower right cornerwill be points that are high in the absolute of residuals. The upper rightportion will be those points that are both high in leverage and in theabsolute of residuals.

• DFBETA: how much would the coefficient of an explanatory variable changeif we omitted one observation?

The measure that measures how much impact each observation hason a particular coefficient is DFBETAs. The DFBETA for anexplanatory variable and for a particular observation is the differencebetween the regression coefficient calculated for all of the data andthe regression coefficient calculated with the observationdeleted, scaled by the standard error calculated with theobservation deleted. The cut-off value for DFBETAs is 2/sqrt(n),where n is the number of observations.

Page 72: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

AustraliaAustraliaAustraliaAustralia

AustraliaAustraliaAustraliaAustralia

AustraliaAustralia

AustraliaAustralia

Australia

Australia

AustraliaAustraliaAustraliaAustraliaAustraliaAustralia

AustraliaAustralia

Australia

AustraliaAustralia

AustraliaAustralia

AustraliaAustralia

Australia

AustraliaAustralia

AustriaAustriaAustriaAustria

Austria

AustriaAustriaAustria

AustriaAustriaAustriaAustriaAustria

Austria

AustriaAustriaAustriaAustriaAustria

AustriaAustriaAustria

AustriaAustriaAustria

AustriaAustriaAustriaAustriaAustriaAustria

AustriaBelgiumBelgiumBelgiumBelgium

BelgiumBelgiumBelgium

BelgiumBelgiumBelgiumBelgiumBelgium

Belgium

Belgium

BelgiumBelgiumBelgiumBelgium

Belgium

BelgiumBelgiumBelgium

Belgium

Belgium

BelgiumBelgium

Belgium

Belgium

BelgiumBelgium

Belgium

Belgium

Canada

CanadaCanadaCanadaCanadaCanadaCanadaCanada

CanadaCanadaCanadaCanada

CanadaCanada

CanadaCanadaCanadaCanadaCanada

Canada

Canada

Canada

CanadaCanada

CanadaCanadaCanadaCanadaCanadaCanada

CanadaCanada

Denmark Denmark

Denmark

DenmarkDenmarkDenmarkDenmark

DenmarkDenmark

Denmark

DenmarkDenmarkDenmarkDenmark

Denmark

DenmarkDenmark

Denmark

Denmark

Denmark

DenmarkDenmark DenmarkDenmarkDenmarkDenmark

DenmarkDenmarkDenmarkDenmarkDenmark DenmarkFinlandFinlandFinlandFinland

FinlandFinlandFinland

Finland

Finland

Finland

FinlandFinlandFinlandFinlandFinlandFinland

Finland

FinlandFinlandFinlandFinlandFinland

FinlandFinland

FinlandFinlandFinland

FinlandFinland

Finland

FinlandFinland

France

FranceFranceFranceFranceFranceFrance

FranceFranceFranceFranceFranceFrance

FranceFrance

FranceFranceFranceFranceFrance

FranceFranceFranceFrance

FranceFranceFranceFrance

France

France

France

FranceGermanyGermanyGermanyGermanyGermanyGermany

GermanyGermany

GermanyGermanyGermanyGermanyGermanyGermanyGermany

GermanyGermanyGermanyGermanyGermanyGermany

GermanyGermanyGermanyGermanyGermanyGermanyGermanyGermanyGermany

Germany

Germany

IrelandIrelandIrelandIrelandIreland

Ireland

Ireland

IrelandIrelandIrelandIrelandIreland

IrelandIreland

Ireland

IrelandIreland

IrelandIrelandIrelandIreland

Ireland

Ireland

Ireland IrelandIreland

Ireland

IrelandIreland

IrelandIreland

Ireland

ItalyItaly

ItalyItalyItalyItaly

ItalyItaly

Italy

ItalyItaly

Italy

Italy

ItalyItaly

ItalyItalyItaly

ItalyItalyItalyItalyItalyItaly

ItalyItalyItalyItaly

ItalyItalyItaly

Italy

JapanJapan

Japan

Japan

JapanJapan

JapanJapan

Japan

JapanJapanJapan

Japan

JapanJapanJapanJapanJapanJapanJapanJapanJapanJapanJapanJapanJapan

JapanJapanJapanJapanJapanJapan

NetherlandsNetherlandsNetherlands

NetherlandsNetherlandsNetherlandsNetherlandsNetherlandsNetherlands

NetherlandsNetherlandsNetherlands

NetherlandsNetherlands

NetherlandsNetherlands

NetherlandsNetherlands

NetherlandsNetherlandsNetherlands

Netherlands

NetherlandsNetherlands

NetherlandsNetherlands

Netherlands

Netherlands

Netherlands

NetherlandsNetherlands

Netherlands

NorwayNorway

NorwayNorwayNorwayNorwayNorway

NorwayNorwayNorwayNorway

Norway

NorwayNorway

Norway

NorwayNorwayNorwayNorway

Norway

Norway

NorwayNorwayNorway

NorwayNorway

NorwayNorway

Norway

NorwaySwedenSweden

Sweden

SwedenSwedenSwedenSwedenSwedenSwedenSweden

SwedenSwedenSwedenSwedenSweden

Sweden

Sweden SwedenSweden

Sweden

SwedenSwedenSwedenSweden

SwedenSwedenSweden

Sweden

Sweden

Sweden

SwedenSweden

Switzerland

SwitzerlandSwitzerland

SwitzerlandSwitzerland

SwitzerlandSwitzerland

Switzerland

SwitzerlandSwitzerland

Switzerland

SwitzerlandSwitzerland

Switzerland

Switzerland

Switzerland

Switzerland

Switzerland

Switzerland

UK

UK

UK

UKUK

UKUK

UK

UKUK

UK

UK

UK

UK

UKUKUKUK

UKUK

UK

UK

UK

UK

UK

UKUK

UK

UK

UKUKUK

USUSUSUSUSUSUSUS

US

USUSUS

USUS

USUSUSUS

US

US

US

USUS

USUSUSUSUSUS

US

USUS

0.0

2.0

4.0

6.0

8L

eve

rag

e

0 .005 .01 .015 .02 .025Normalized residual squared

Page 73: Gauss-Markov Assumptions, Full Ideal Conditions of OLS · correlation/ autocorrelation (panel and time-series data)

Solutions: Outliers

• Include or exclude obvious outlier cases and check their impact onthe regression coefficients.

• Logarithmize the dependent variable and possibly the explanatoryvariables as well – this reduces the impact of larger values.

jacknife, bootstrap:• Are both tests and solutions at the same time: they show whether

single observations have an impact on the results. If so, one can usethe jacknifed and bootstrapped coefficients and standard errorswhich are more robust to outliers than normal OLS results.

• Jacknife: takes the original dataset, runs the same regression N-1times, leaving one observation out at a time.Example command in STATA: „jacknife _b _se, eclass: reg spend unemgrowthpc depratio left cdem trade lowwage fdi skand “

• Bootstrapping is a re-sampling technique: for the specified numberof repetitions, the same regression is run for a different samplerandomly drawn from the original dataset.Example command: „bootstrap _b _se, reps(1000): reg spend unemgrowthpc depratio left cdem trade lowwage fdi skand “