Download - Lesson 11:

Lesson11-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Lesson 11:

Regressions Part IIRegressions Part II


Does watching television rot your mind?

Zavodny, Madeline (2006): “Does watching television rot your mind? Estimates of the effect on test scores,” Economics of Education Review, 25 (5): 565–573

Television is one of the most omnipresent features of Americans’ lives. The average American adult watches about 15 h of television per week, accounting for almost one-half of free time.

The substantial amount of time that most individuals spend watching television makes it important to examine its effects on society, including human capital accumulation and academic achievement.


Data & Regression model

This analysis uses three data sets to examine the relationship between television viewing and test scores: the National Longitudinal Survey of Youth 1979 (NLSY), the HSB survey and the NELS. Each survey includes test scores and a question about the number of hours of television watched by young adults.

Test score of individual i at time t


Summary of samples from data sets


Regression results

**p<0.01; *p<0.05; †p<0.1


Multiple Linear Regression Model

Relationship Between Variables Is a Linear Function

Y intercept Slope Random Error

Dependent (Response) Variable

Independent (Explanatory) Variable

Y = 0 + 1X1 + 2X2 + 3X3 + … + kXk +


Finance Application: multifactor pricing model

It is assumed that rate of return on a stock (R) is linearly related to the rate of return on some factor and the rate of return on the overall market (Rm).

Rate of return on a particular oil company stock i at time t

Rate of return on some major stock index

The rate of return on crude oil price on date t

Rit = 0 + oi Rot+ 1Rmt +


Estimation by Method of momentsNumber of moment condition needed

Y = 0 + 1X1 + 2X2 + 3X3 + … + kXk +

k+1 parameters to estimate. Need k+1 moment conditions.

Assumption #1 E() = 0 implies E(y) – 0 – 1 E(x1) – 2 E(x2) - … k E(xk)= 0

Assumption #2 E(x1) =0 implies E[(y – 0 – 1x1 - … - kxk)x1]=0 Since Cov(, x1) = E(x1) – E()E(x1) = E(x1), the

assumption really imply and x are uncorrelated. Assumption #3: E(x2) =0 Assumption #4: E(x3) =0 … Assumption #k+1: E(xk) =0


Estimation of 0, 1, 2,…, k

Method of moments

Two approaches:1. Solve the 0, 1, 2,…, k from the k+1 moment

conditions, in terms of covariances, variances and means. Plug in to sample analog of these covariances, variances and means ro produce the sample estimate b0, b1, b2,…, bk

2. Assume b0, b1, b2,…, bk, solve them from the sample analog of the k+1 moment conditions.


Estimation of 0, 1, 2,…, k

Maximum Likelihood

Assume i to be independent identically distributed with normal distribution of zero mean and variance 2. Denote the normal density for be f()=f(y-0-1x1-2x2-…-kxk)

f(e)= f(y-b0-b1x1-b2x2-…-bkxk)

normal density

Choose b0, b1, b2, …, bk to maximize the joint likelihood:

L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)


To estimate 0 and 1 using ML (Computer)

We do not know 0, 1, 2, …, k. Nor do we know i. In fact, our objective is estimate 0, 1, 2, …, k.

The procedure of ML:1. Assume a combination of 0, 1, 2, …, k, call it b0, b1, b2, …, bk.

Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki)

2. Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk:

L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)

Assume many more combination of 0, 1, 2, …, k, and repeat the above two steps, using a computer program (such as Excel).

Choose the b0, b1, b2, …, bk that yield a largest joint likelihood.


To estimate 0 and 1 using ML (Calculus)

Choose b0, b1, b2, …, bk to maximize the likelihood function L(b0, b1, b2, …, bk) – using calculus.

Take the first derivative of L(b0, b1, b2, …, bk) with respect to b0, set it to zero.

Take the first derivative of L(b0, b1, b2, …, bk) with respect to bj, set it to zero.

Solve b0, b1, b2, …, bk using the k+1 equations.

The procedure of ML:1. Assume a combination of 0, 1, 2, …, k, call it b0, b1, b2, …, bk.

Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki)

2. Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk:

L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en)


Estimation Ordinary least squares

For each value of X, there is a group of Y values, and these Y values are normally distributed.

Yi~ N(E(Y|X1, X2,…,Xk), i2), i=1,2,…,n

The means of these normal distributions of Y values all lie on the straight line of regression.

E(Y|X1, X2,…,Xk) = 0+ 1X1 + 2X2 +… + kXk

The standard deviations of these normal distributions are equal.

i2= 2 i=1,2,…,n

i.e., homoskedasticity


Choosing the line that fits bestOrdinary Least Squares (OLS) Principle

Straight lines can be described generally by yi = b0 + b1x1i+ b2x2i +…+ bkxki i=1,…,n

Finding the best line with smallest sum of squared difference is the same as

It can be shown the minimization yields the similar sample moment conditions as discussed earlier in the method of moments.

Min S(b0,b1) = [yi – (b0 + b1x1i+ b2x2i +…+ bkxki)]2


It can be shown that the estimators are BLUE

Best: smallest variance Linear: linear combination of yi

Unbiased: E(b0) = 0, E(b1) = 1

Estimator


yi = b0 + b1x1i + b2x2i + … + bkxki + ei

Prediction: y* = b0 + b1x1 + b2x2 + … + bkxk

Slope (bj) Estimated Y changes by bj for each 1 unit increase in Xj,,

holding other variables constanty* + y= b0 + b1x1 + …+ bj(xj+1)+… + bkxk

y= bj

More generally,y* + y= b0 + b1x1 + …+ bj(xj+xj)+… + bkxk

y= bjxj

y/x = b1

Y-Intercept (b0 ) Estimated value of Y when X1 = X2 = … = Xk = 0

Interpretation of Coefficients


You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00).

You’ve collected the You’ve collected the following data:following data:

RespResp SizeSize CircCirc

11 11 2244 88 8811 33 1133 55 7722 66 4444 1010 66

Parameter Estimation Example

y x1 x2


Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|

INTERCEP 1 0.0640 0.2599 0.246 0.8214

ADSIZE 1 0.2049 0.0588 3.656 0.0399

CIRC 1 0.2805 0.0686 4.089 0.0264

Parameter Estimation Computer Output

Slope (b1): # Responses to Ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in Ad Size Holding Circulation Constant

Slope (b2): # Responses to Ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulation Holding Ad Size Constant


Assumptions: Observed Y values are normally distributed

around each estimated value of Y*

Constant variance

se measures the dispersion of the points around the regression line If se = 0, equation is a “perfect” estimator

se may be used to compute confidence intervals of the estimated value

Interpreting the Standard Error of the Estimate


1. Tests if there is a linear relationship between Xj & Y after other variables are controlled for.

2. Involves population slope j

3. Hypotheses H0: j = 0 (Xj should not appear in the linear

relationship) H1: j 0

4. Theoretical basis is sampling distribution of slopes

Test of Slope Coefficient (bj)


Basis for Inference About the Population Regression Slope

Let j be a population regression slope and bj its least squares estimate based on n data points. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable

is distributed as Student’s t with (n – k - 1) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.

t= (bj – j) / Sbj


Confidence Intervals for the Population Regression Slope j

If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope j is given by

bj - t(n-k-1),/2 Sbj < j < bj + t(n-k-1),/2 Sbj


Some cautions about the interpretation of significance tests

Rejecting H0: j = 0 and concluding that the relationship between xj and y is significant does not enable us to conclude that a cause-and-effect relationship is present between xj and y.

Causation requires: Association Accurate time sequence Other explanation for correlation

Correlation Causation Correlation Causation


Some cautions about the interpretation of significance tests

Just because we are able to reject H0: j = 0 and demonstrate statistical significance does not enable us to conclude that the relationship between x and y is linear.

Linear relationship is a very small subset of possible relationship among variables.

A test of linear versus nonlinear relationship requires another batch of analysis.


Are the assumptions valid? Assumption #1: Linearity Assumption #2: A set of variables should be included. Assumption #3: The explanatory variables are

uncorrelated with error term. Assumption #4: The error term has a constant variance. Assumption #5: The errors are independent of each other.

yi = b0 + b1x1i + b2x2i + … + bkxki + ei

Evaluating the Model


Total Sum of Squares (SST) Measures variation of observed Yi around the

mean,Y Explained Variation (SSR)

Variation due to relationship between X & Y

Unexplained Variation (SSE) Variation due to other factors

SST=SSR+SSE

Measures of Variation in Regression


Variation in y (SST) = SSR + SSE

n

1i

2i )yy(

n

1i

2**i )yyyy(

n

1i

**i

2*2*i )yy)(yy()yy()yy(

n

1i

**i

n

1i

2*n

1i

2*i )yy)(yy()yy()yy(

n

1i

2*n

1i

2*i )yy()yy(

SST:

SSE SSR

=0, as imposed in the estimation, E(x)=0.


Y

X

Y

Xi

Total Sum of Squares (Yi - Y)2

Unexplained Sum of Squares (Yi -Yi

*)2

Explained Sum of Squares (Yi

* - Y)2

Yi

SST

SSE

SSR

yi* = b0 +b1xi

Variation Measures


R2 (=r2, the coefficient of determination) measures the proportion of the variation in y that is explained by the variation in x.

n2

i2 i 1

n n2 2

i ii 1 i 1

(y y) SSESSE SSR

R 1SST(y y) (y y)

R2 takes on any value between zero and one. R2 = 1: Perfect match between the line and the data

points. R2 = 0: There are no linear relationship between x and

y.

Variation in y (SST) = SSR + SSE


Adjusted R-square

(unadjusted) R-square increases with the number of variables included. Thus, using R-square as a measure, we will always

conclude a model with more variables are better. However, adding a new variables is costly. Additional variable may

add to the uncertainty of estimating y. Thus, we would like to have a measure that will penalize the addition

of variables.

1kn1n

)R1(1)1n/(SST

)1kn/(SSE1R 22

2R

Fix an R2, adjusted R2 decreases with k.

Fix k, adjusted R2 increases with R2.


International price discrimination

Cabolis, Christos, Sofronis Clerides, Ioannis Ioannou and Daniel Senft (2007): “A textbook example of international price discrimination,” Economics Letters, 95(1): 91-95.


Motivation

International price comparisons have a long history in economics. Macroeconomists have used them extensively to test for purchasing power parity and the law of one price. International trade economists have been interested in international price differences as evidence of trade barriers while industrial organization economists have studied issues of market structure. The popular and business press have also shown a keen interest and frequently report intercity price comparisons for standardized products such as the Big Mac or a Starbucks cappuccino.

The paper documents the existence of very large differences in the prices of textbooks across countries.


Data

Our data were collected from the Internet sites of Amazon.com in two distinct phases. In May 2002 we collected information on prices and characteristics of 268 books that were on sale on both the US and UK websites of Amazon, Inc. This data set includes both textbooks and general audience books and we refer to it as our “broad sample”. In December 2002, we collected additional data on economics textbooks; this is our “econ sample”. In this phase, we broadened our sample by including Canada in the search and collected more detailed information about each book.

We tested for price differences by running a simple hedonic regression of price on book characteristics and on dummy variables that aim to capture differences across countries and book types.


Estimates from the board sampledependent variable: ln(p)

Variable Coefficient Estimate

Standard errors

Intercept 1.045 0.272

Textbook 0.268 0.052

US general book 0.126 0.044

US Textbook 0.306 0.031

Ln(pages) 0.345 0.048

Hardcover 0.343 0.044

N 536

R2 0.454

F-stat 56.52

Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.


Estimates from the Economics sample dependent variable: ln(p)

CommercialHard.

Univ.Press Hard.

Commercial paper

Univ. press paper

US 0.478** (-0.043) 0.143** (-0.045) 0.008 (-0.072) −0.048 (-0.026)

CA 0.248** (-0.049) 0.132** (-0.03) −0.032 (-0.066) 0.011 (-0.036)

US-INTRO 0.027 (-0.045) 0.310* (-0.124)

CA-INTRO 0.074 (-0.062) 0.231 (-0.149)

DELTIME 0.024** (-0.006) 0.021* (-0.008) −0.004 (-0.011) 0.007 (-0.006)

N 304 170 109 99

R2 0.303 0.152 0.223 0.413

F-stat 40.23 6.3 3.92 15.64

Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Testing for Linearity

Key Argument: If the value of y does not change linearly with the value of

x, then using the mean value of y is the best predictor for the actual value of y. This implies is preferable.

If the value of y does change linearly with the value of x, then using the regression model gives a better prediction for the value of y than using the mean of y. This implies y=y* is preferable.

yy


Testing for Linearity

The Global F-testH0: β1 = β2 = … = βk = 0 (no linear relationship)H1: at least one βi ≠ 0 (at least one independent

variable affects Y)

)1kn/()yy(

k/)yy(

)1kn/(SSEk/SSR

MSEMSR

F2

n

1i

*ii

2n

1i

*i

F is distributed with k numerator degree of freedom and n-k-1 denominator degree of freedom. Reject H0 if F > Fk,n-k-1,

[Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid.

Under the null SSR is either zero or very small!!

Test Statistic:


6.53862252.8

14730.0

MSE

MSRF

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

F-Test for Overall Significance

With 2 and 12 degrees of freedom

P-value for the F-Test


H0: β1 = β2 = 0

H1: β1 and β2 not both zero

= .05df1= 2 df2 = 12

Test Statistic:

Decision:

Conclusion:

Since F test statistic is in the rejection region (p-value < .05), reject H0

There is evidence that at least one independent variable affects Y

0

= .05

F.05 = 3.885Reject H0Do not

reject H0

6.5386MSE

MSRF

Critical Value:

F = 3.885

F-Test for Overall Significance(continued)

F


Consider a multiple regression model involving variables xj and zj , and the null hypothesis that the z variable coefficients are all zero:

Tests on a Subset of Regression Coefficients

yi = 0 + 1 x1i + …+ k xki + 1 z1i + … + r zri + i

H0: 1 = 2 = … = r = 0H1: at least one of j ≠0 (j=1,…,r)

Under the null SSR due to Z is either zero or very small!!


Goal: compare the error sum of squares for the complete model with the error sum of squares for the restricted model

First run a regression for the complete model and obtain SSE

Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the restricted error sum of squares SSE(r).

Compute the F statistic and apply the decision rule for a significance level

Tests on a Subset of Regression Coefficients

0 r,n K r 1,α

(SSE(r) SSE) / rReject H if F F

SSE/(n-k-1)

Note: SSE/(n-k-1) = Se2


EXAMPLE 1

A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College).


Example 1 continued

Note the following regarding the regression equation. The variable college is called a dummy or indicator variable.

It can take only one of two possible outcomes. That is a child is a college student or not.

Other examples of dummy variables include gender, the part is acceptable or unacceptable, the voter will or will not vote for the incumbent governor.

We usually code one value of the dummy variable as “1” and the other “0.”


EXAMPLE 1 continued

Family Food Income Size Student

1 3900 376 4 0

2 5300 515 5 1

3 4300 516 4 0

4 4900 468 5 0

5 6400 538 6 1

6 7300 626 7 1

7 4900 543 5 0

8 5300 437 4 0

9 6100 608 5 1

10 6400 513 6 1

11 7400 493 6 1

12 5800 563 5 0


EXAMPLE 1 continued

Use a computer software package, such as Excel, to develop a correlation matrix.

From the analysis provided by Excel, write out the regression equation:

Y*= 954 +1.09X1 + 748X2 + 565X3

What food expenditure would you estimate for a family of 4, with no college students, and an income of $50,000 (which is input as 500)?


The regression equation is

Food = 954 + 1.09 Income + 748 Size + 565 Student

Predictor Coef SE Coef T P

Constant 954 1581 0.60 0.563

Income 1.092 3.153 0.35 0.738

Size 748.4 303.0 2.47 0.039

Student 564.5 495.1 1.14 0.287

S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

Analysis of Variance

Source DF SS MS F P

Regression 3 10762903 3587634 10.94 0.003

Residual Error 8 2623764 327970

Total 11 13386667

EXAMPLE 1 continued


From the regression output we note: The coefficient of determination is 80.4 percent. This

means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student.

Each additional $100 dollars of income per year will increase the amount spent on food by $109 per year.

An additional family member will increase the amount spent per year on food by $748.

A family with a college student will spend $565 more per year on food than those without a college student.

EXAMPLE 1 continued


EXAMPLE 1 continued

The estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student is $4,491.

Y* = 954 + 1.09(500) + 748(4) + 565 (0)

= 4491


EXAMPLE 1 continued

Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero.

H0 is rejected if F>4.07.

From the computer output, the computed value of F is 10.94.

Decision: H0 is rejected. Not all the regression coefficients are zero

0 equal s all Not :0: 13210 HversusH


EXAMPLE 1 continued

Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size.

From the computer output, the only significant variable is SIZE (family size) using the p-values. The other variables can be omitted from the model.

Thus, using the 5% level of significance, reject H0 if the p-value<.05

0 :0: 2120 HversusH


Correlation Matrix

A correlation matrix is used to show all possible simple correlation coefficients among the variables. See which xj are most correlated with y, and which xj

are strongly correlated with each other.

y x1 x2 xk

y 1.00 1x yr

2x yr kx yr

x1 1.00 1 2x xr 1 kx xr

x2 1.00 2 kx xr

xk 1.00


1. High correlation between X variables2. Multicollinearity makes it difficult to separate

effect of x1 on y from the effect of x2 on y. Leads to unstable coefficients depending on X variables in model

3. Always exists – a matter of degree

4. Example: using both age & height as explanatory variables in same model

Multicollinearity


1. Examine correlation matrix Correlations between pairs of X variables are

more than with Y variable

2. Few remedies Obtain new sample data Eliminate one correlated X variable

Detecting Multicollinearity


The correlation matrix is as follows: Food Income SizeIncome 0.587

Size 0.876 0.609

Student 0.773 0.491 0.743

The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.

None of the correlations among the independent variables should cause problems. All are between –.70 and .70.

EXAMPLE 1 continued


EXAMPLE 1 continued

We rerun the analysis using only the significant independent family size.

The new regression equation is:

Y* = 340 + 1031X2

The coefficient of determination is 76.8 percent. We dropped two independent variables, and the R-square term was reduced by only 3.6 percent.


Example 1 continued

Regression Analysis: Food versus Size

The regression equation isFood = 340 + 1031 Size

Predictor Coef SE Coef T PConstant 339.7 940.7 0.36 0.726Size 1031.0 179.4 5.75 0.000

S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%

Analysis of Variance

Source DF SS MS F PRegression 1 10275977 10275977 33.03 0.000Residual Error 10 3110690 311069Total 11 13386667


Purposes Evaluate violations of assumptions, including the

assumption of linearity. Graphical Analysis of Residuals

Plot residuals versus Xi values

Difference between actual Yi & predicted Yi*

Studentized residuals:Allows consideration for the magnitude of the

residuals

Residual Analysis


Heteroscedasticity OK Homoscedasticity

Using Standardized Residuals (e/se)

SR

X

SR

X

Residual Analysis for Homoscedasticity

When the requirement of a constant variance (homoscedasticity) is violated, we have heteroscedasticity.

For example, for xi>xj

Var(i|xi)>var(j|xj)


Residual Analysis for Independence

Not Independent Independent

X

SR

X

SR

OK

Using Standardized Residuals (e/se)


+

+++ +

++

++

+ +

++ + +

+

++ +

+

+

+

+

+

+Time

Residual Residual

Time+

+

+

Note the runs of positive residuals,replaced by runs of negative residuals

Note the oscillating behavior of the residuals around zero.

0 0

Patterns in the appearance of the residuals over time indicates that autocorrelation exists.


n

ii

n

iii

e

eeD

1

2

2

21)( Should be close to 2.

If not, examine the model for autocorrelation.

Used when data is collected over time to detect autocorrelation (Residuals in one time period are related to residuals in another period)

Measures Violation of independence assumption

The Durbin-Watson Statistic

Intuition: If x and y are independent, Var(x-y)= Var(x) + Var(y)


An outlier is an observation that is unusually small or large.

Several possibilities need to be investigated when an outlier is observed: There was an error in recording the value. The point does not belong in the sample. The observation is valid.

Identify outliers from the scatter diagram. It is customary to suspect an observation is an

outlier if its |standard residual| > 2

Outliers


+

+

+

+

+ +

+ + ++

+

+

+

+

+

+

+

The outlier causes a shift in the regression line

… but, some outliers may be very influential

++++++++++

An outlier An influential observation


Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.

The transformations can improve the linear relationship between the dependent variable and the independent variables.

Many computer software systems allow us to make the transformations easily.

Remedying violations of the required conditions


The relationship between the dependent variable and an independent variable may not be linear

Can review the scatter diagram to check for non-linear relationships

Example: Quadratic model

The second independent variable is the square of the first variable

Nonlinear Regression Models

εXβXββY 212110


Quadratic Regression Model

where:β0 = Y intercept

β1 = regression coefficient for linear effect of X on Y

β2 = regression coefficient for quadratic effect on Y

εi = random error in Y for observation i

i21i21i10i εXβXββY

Model form:


Linear fit does not give random residuals

Linear vs. Nonlinear Fit

Nonlinear fit gives random residuals

X

resi

dua

ls

X

Y

X

resi

dua

ls

Y

X


Quadratic Regression Model

Quadratic models may be considered when the scatter diagram takes on one of the following shapes:

X1

Y

X1X1

YYY

β1 < 0 β1 > 0 β1 < 0 β1 > 0

β1 = the coefficient of the linear term

β2 = the coefficient of the squared term

X1

β2 > 0 β2 > 0 β2 < 0 β2 < 0

i21i21i10i εXβXββY


Testing for Significance: Quadratic Effect

Testing the Quadratic Effect Compare the linear regression estimate

with quadratic regression estimate

HypothesesH0: 2=0 (The quadratic term does not improve the

model)

H1: 2≠0 (The quadratic term improves the model)

2 12110 xbxbby ˆ

110 xbby ˆ



Testing the Quadratic EffectHypotheses H0: 2=0 (The quadratic term does not improve the model)

H1: 2≠0 (The quadratic term improves the model)

The test statistic is

2b

22

s

βbt

3nd.f.

where:

b2 = squared term slope coefficient

β2 = hypothesized slope (zero)

Sb = standard error of the slope

2



Testing the Quadratic Effect

Compare Adjusted R2 from simple regression to

Adjusted R2 from the quadratic model

If Adjusted R2 from the quadratic model is larger than Adjusted R2 from the simple model, then the quadratic model is a better model


Example: Quadratic Model

Purity increases as filter time increases:Purity

FilterTime

3 1

7 2

8 3

15 5

22 7

33 8

40 10

54 12

67 13

70 14

78 15

85 15

87 16

99 17

Purity vs. Time

0

20

40

60

80

100

0 5 10 15 20

Time

Pu

rity




R Square 0.96888



Simple regression results: y* = -11.283 + 5.985 Time

CoefficientsStandard

Error t Stat P-value

Intercept -11.28267 3.46805 -3.25332 0.00691

Time 5.98520 0.30966 19.32819 2.078E-10

F Significance F

373.57904 2.0778E-10

Time Residual Plot

-10

-5

0

5

10

0 5 10 15 20

Time

Resid

uals

t statistic, F statistic, and R2 are all high.

But …. the residuals are not random:


CoefficientsStandard

Error t Stat P-value

Intercept 1.53870 2.24465 0.68550 0.50722

Time 1.56496 0.60179 2.60052 0.02467

Time-squared 0.24516 0.03258 7.52406 1.165E-05


R Square 0.99494



F Significance F

1080.7330 2.368E-13

Quadratic regression results:

y = 1.539 + 1.565 Time + 0.245 (Time)2

^


Time Residual Plot

-5

0

5

10

0 5 10 15 20

Time

Res

idua

ls

Time-squared Residual Plot

-5

0

5

10

0 100 200 300 400

Time-squared

Res

idua

lsThe quadratic term is significant and improves the model: R2 is higher and se is lower, residuals are now random


Original multiplicative model

Transformed multiplicative model

Some highly nonlinear models may be transformed into a linear modelThe Log Transformation

The Multiplicative Model:

εXXβY 21 β2

β10

)log(ε)log(Xβ)log(Xβ)log(βlog(Y) 22110


Interpretation of coefficients

For the multiplicative model:

When both dependent and independent variables are logged: The coefficient of the independent variable X1

can be interpreted as

A 1 percent change in X1 leads to an estimated b1 percentage change in the average value of Y

b1 is the elasticity of Y with respect to a change in X1

i1i10i ε logX log ββ log Ylog

Note: logY = b0 + b1 logX b1 = logY /logX = %Y/%X

logY = logY2 – log Y1 = log(Y2/Y1) = log(1+(Y2-Y1)/Y1) ≈ (Y2-Y1)/Y1


Dummy Variables

A dummy variable is a categorical independent variable with two levels: yes or no, on or off, male or female recorded as 0 or 1

Regression intercepts are different if the variable is significant

Assumes equal slopes for other variables If more than two levels, the number of dummy

variables needed is (number of levels - 1)


Dummy variable example

Intrersted in: Do the average income differ across male and female? Compute the average income for female. Compute the average income for male. Conduct a two sample test of equal mean.

Y= 0 + 1X1 +

Alternative approach: regression. Y=income X1 = 1 if male; 0 if female.

X1 = 0 implies Y = 0 + X1 = 1 implies Y = 0 + 1 + Test H0: 1=0.


Dummy Variable Example

Let:

y = Pie Sales

x1 = Price

x2 = Holiday (X2 = 1 if a holiday occurred during the week)

(X2 = 0 if there was no holiday that week)

210 xbxbby21

ˆ


Same slope

Dummy Variable Example

x1 (Price)

y (sales)

b0 + b2

b0

1010

12010

xb b (0)bxbby

xb)b(b(1)bxbby

121

121

ˆ

ˆHoliday

No Holiday

Different intercept

Holiday (x2 = 1)No Holiday (x

2 = 0)

If H0: β2 = 0 is rejected, then“Holiday” has a significant effect on pie sales


Sales: number of pies sold per weekPrice: pie price in $

Holiday:

Interpreting the Dummy Variable Coefficient

Example:

1 If a holiday occurred during the week

0 If no holiday occurred

b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price

)15(Holiday 30(Price) - 300 Sales


Interaction Between Explanatory Variables

Hypothesizes interaction between pairs of x variables Response to one x variable may vary at different

levels of another x variable

Contains two-way cross product terms

)x(xbxbxbb

xbxbxbby

21322110

3322110

ˆ


Effect of Interaction

Given:

Without interaction term, effect of X1 on Y is measured by β1

With interaction term, effect of X1 on Y is measured by β1 + β3 X2,

21322110

1231220

XXβXβXββ

)XXβ(βXββY

which changes as X2 changes


x2 = 1:y = 1 + 2x1 + 3(1) + 4x1(1) = 4 + 6x1

x2 = 0: y = 1 + 2x1 + 3(0) + 4x1(0) = 1 + 2x1

Interaction Example

Slopes are different if the effect of x1 on y depends on x2 value

x1

44

88

1212

00

00 110.50.5 1.51.5

y

Suppose x2 is a dummy variable and the estimated regression equation is 2121 x4x3x2x1y ˆ

^

^


Significance of Interaction Term

The coefficient b3 is an estimate of the difference in the coefficient of x1 when x2 = 1 compared to when x2 = 0

The t statistic for b3 can be used to test the hypothesis

If we reject the null hypothesis we conclude that there is a difference in the slope coefficient for the two subgroups

0 3

1 3

H :β 0

H : β 0


- END -

Lesson 11:Lesson 11: Regressions Part IIRegressions Part II

Download - Lesson 11:

Top Related