t2.ols linear regression (stock & watson)

P1.T2.Quantitative Analysis:

OLS Linear Regression Chapters 4 - 7 of Stock and Watson

FRM 2012 Practice Questions

By David Harper, CFA FRM CIPM

www.bionicturtle.com

FRM 2012 QUANT: STATISTICS REVIEW 1 www.bionicturtle.com

Table of Contents

Selected Ideas from S&W Chapters 4 - 7 ............................................................. 2

Question 214: Regression lines (Stock & Watson) ................................................... 3

Question 215: Properties of linear regression (Stock & Watson) ................................. 5

Question 216: Regression sums of squares: ESS, SSR, and TSS .................................... 7

Question 217: Regression coefficients (Stock & Watson) .......................................... 9

Question 218: Theory of Ordinary Least Squares (Stock & Watson) ............................ 11

Question 219: Omitted variable bias................................................................ 14

Question 220: OLS estimators in a multiple regression .......................................... 16

Question 221: Joint null hypothesis in multiple OLS regression ................................ 18

Question 222: Homoskedasticity-only F-statistic ................................................. 21

Question 223: Adjusted R^2 in a multiple regression ............................................ 23


Selected Ideas from S&W Chapters 4 - 7

An ordinary least squares (OLS) linear regression with one regressor (a.k.a., independent

or explanatory variable) is given by:

0 1i i iY b b X e .

o The error term contains all the other factors aside from (X) that determine the

value of the regressand dependent variable (Y) for a specific observation The t-

statistic tests the null hypothesis that the population mean equals a certain value.

The key assumptions of the OLS linear regression model are:

o Conditional distribution of u(i) given X(1i), X(2i),…,X(ki) has mean of zero

o X(1i), X(2i), … X(ki), Y(i) are independent and identically distributed (i.i.d.)

o Large outliers are unlikely

o No perfect collinearity (in the case of a multiple regression; i.e., two or more regressors)

To test the significance of a coefficient (since we do not know the population variance), we compute a t-ratio which has student’s t distribution

1 1 regression coefficient null hypothesis [0]

( 1) (regression coefficient)

b Bt

se b se

The coefficient of determination is given by:

2 1ESS SSR

RTSS TSS

The adjusted R^2 is given by:

2 11

1

n SSRR

n k TSS

The standard error of the regression (SER) is given by:

1

SSRSER

n k

Where k = number of slope coefficients; e.g., in the

case of a single variable regression, the

denominator is (n-2).


Question 214: Regression lines (Stock & Watson)

AIMs: Explain how regression analysis in econometrics measures the

relationship between dependent and independent variables. Define and

interpret a population regression function, regression coefficients,

parameters, slope and the intercept. Define and interpret the stochastic

error term (or noise component).

214.1. According to the capital asset pricing model (CAPM), the expected return of a security: E[R(i)] = Rf + B(i,M)*RiskPrice(M), where R(i) is the security's return, Rf is the riskfree rate, B(i,M) is the security's beta with respect to the market, and RiskPrice(M) is the market risk premium which is also known as market's "price of risk." The riskfree rate is 3.0% and RiskPrice(M) is 4.0%. We conduct a regression analysis for a stock and discover that, with respect to the market, the stock's correlation and beta are, respectively, 0.50 and 1.50. That is, rho(stock, market) = 0.50 and beta(stock, market) = 1.50. If the volatilities of the overall market and the stock do not change, but their correlation, rho(stock, market), increases to 0.80, what is the CHANGE in the stock's expected return?

a) +0.30% (30 basis points) b) +1.2% c) +2.4% d) +3.6%

214.2. A regression of average weekly earnings (AWE, measure in dollars) on age (AGE, measured in years) using a random sample of college-educated full-time workers aged 25-65 is given by: AWE = $600 + 8.3*AGE. According to the regression model, what is the expected weekly pay difference between a 35-year-old worker and a 45-year-old worker? (adapted from Stock & Watson Question 4.3).

a) $52.50 b) $83.00 c) $973.50 d) Not enough information

214.3. Pretend GARP regressed the exam scores (FRMScore) against preparation time (Hours) and returned the following regression: FRMScore(i) = 23.2 + 0.18*Hours(i) + u(i). Which of the following is the best interpretation of the error term, u(i)?

a) It allows for users to adjust to inform the intercept with a "real world" interpretation b) It contains the assumed but unobserved correlation between the error term and the

regressor (independent variable) c) The error term represents all of the factors other than preparation time that influence

the score d) It is the estimator of the standard deviation of the regression error


Answers: 214.1. D. +3.6% The beta (stock, market) increases from 1.50 to 2.40: 0.80/0.50 * 1.50 = 2.40. Or, put another way, since beta = cov(stock,market)/variance(market) = rho(stock,market)*volatility(stock)/volatility(market), in this case: 1.50 beta = 0.50 correlation * volatility(stock)/volatility(market) = 0.50 correlation * cross-volatility. If cross-volatility is constant, then 0.80 correlation implies (0.80/0.50)*1.50 = 2.40 beta revised. If the beta increases by 2.40 - 1.50 = 0.90, then the expected return increases by 0.90 * MRP = 0.90 * 4.0% = 3.60%. Note: the riskfree rate has no impact on the change in expected return. 214.2. B. $83.00 10 years * $8.3 = $83.00 E[earnings for 35 year old] = 600 + 8.3*35 = $890.50; E[earnings for 45 year old] = 600 + 8.3*45 = $973.50; 973.50 - 890.50 = $83.00. 214.3. C. The error term represents all of the factors other than preparation time that influence the score Stock & Watson: The intercept and the slope are the coefficients of the population regression line, also known as the parameters of the population regression line. The slope is the change in Y associated with a unit change in X. The intercept is the value of the population regression line when X = 0; it is the point at which the population regression line intersects the Y axis. In some econometric applications, the intercept has a meaningful economic interpretation. In other applications, the intercept has no real-world meaning; for example, when X is the class size, strictly speaking the intercept is the predicted value of test scores when there are no students in the class! When the real-world meaning of the intercept is nonsensical, it is best to think of it mathematically as the coefficient that determines the level of the regression line. The term in Equation (4.5) is the error term. The error term incorporates all of the factors responsible for the difference between the ith district’s average test score and the value predicted by the population regression line. This error term contains all the other factors besides X that determine the value of the dependent variable, Y, for a specific observation, i. In the class size example, these other factors include all the unique features of the i th district that affect the performance of its students on the test, including teacher quality, student economic background, luck, and even any mistakes in grading the test."


Question 215: Properties of linear regression (Stock & Watson)

AIMs: Define and interpret a sample regression function, regression

coefficients, parameters, slope and the intercept. Describe the key

properties [assumption] of a linear regression.

215.1. We regressed the monthly returns of Apple (AAPL) against the S&P 500 ($SPX) for the last thirty-six months ending January 31st; Apple's monthly return is the dependent variable (Y, regressand), the index's monthly return is the independent variable (X, regressor) and the number of pair observations, n = 36. In regard to the dependent variable, Apple's average monthly return over the period was +4.837% with a standard deviation of 6.686%. In regard to the independent variable, the average monthly return of the index was +1.69% with a standard deviation of 4.687%. The covariance between the two series, Covariance(X,Y), was 0.00216. What is the equation for the sample regression line? (note: I did use actual data, trying to keep it real folks!)

a) AAPL = 0.01 + 0.33*SPX b) AAPL = 0.02 + 0.67*SPX c) AAPL = 0.03 + 0.98*SPX d) AAPL = 0.04 + 1.29*SPX

215.2. A dataset consists of the price of gasoline (Price), the regressor, and the weekly household demand for gas in terms of gallons (Quantity), the regressand. An ordinary least squares (OLS) regression line produces the following demand function: Quantity = 11 - 1.5*Price. One of the datapoints in the scatterplot is a household that "demands" 8.0 gallons when the price is $3.00 per gallon; i.e., Quantity(i) = 8.0 gallons, Price(i) = $3.00. What is the residual of this observation, u(i)?

a) -1.5 b) zero c) +1.5 d) Impossible, the observation must lie on the line

215.3. Each of the following is a key property [assumption], according to Stock & Watson, of a linear regression EXCEPT for:

a) The conditional distribution of the error term, u(i), given X(i), has a mean of zero b) The variance of the conditional distribution of the error term given X(i), variance[u(i) |

X(i) = x], converges to ZERO as sample (n) and X(i) increase c) Each observation [X(i), Y(i)] for i = 1, ....n, is independent and identically distributed

(i.i.d.) d) Large outliers are unlikely; i.e., X and Y have nonzero finite kurtosis


Answers: 215.1. C. AAPL = 0.03 + 0.98*SPX We need to apply: S&W (4.7): slope (B1) = covariance/variance <-- you should know this! S&W (4.8): intercept (B0) = average_Y - B1*average_X; i.e., the OLS line must pass through the point (average X, average Y) The slope (B1) = 0.00216/4.687%^2 = 0.983 The intercept (B0) = 4.837% - (0.98)(1.69%) = 0.032 As correlation is covariance/(StdDev * StdDev), the correlation = 0.00216/(6.686%*4.687%) = 0.6893. The R^2 = 0.6893^2 = 47.51% 215.2. C. +1.5 The Predicted(i) = Q^(i) = 11 - 1.5*3 = 6.5 gallons. The residual, u(i), is the difference between the observed value and the predicted value. In this case, Q(i) - Q^(i) = 8.0 - 6.5 = 1.5. In regard to (D), please make sure you understand why (D) is utterly false: the OLS generates a series of conditional means, it is impossible for the OLS line to run throughout all of the points. Notice that we know the OLS line runs through (average X, average Y) but even this is not an observation itself! The OLS line may "travel" through none of the observations. It's exists as a function of minimizing the sum of the square of each of the residuals; i.e., the OLS line is derived to solve for MINIMUM[sum of series([Q(i) - Q^(i)]^2)] 215.3. B. An extended assumption is homoskedasticity; i.e., that the variance of the error term is CONSTANT. In regard to (A), (C) and (D), these are the three basic OLS assumptions in Stock & Watson:

1) The conditional distribution of the error term, u(i), given X(i), has a mean of zero: "The conditional distribution of given has a mean of zero. This assumption is a formal mathematical statement about the ‘other factors’ contained in u(i) and asserts that these other factors are unrelated to in the sense that, given a value of X(i), the mean of the distribution of these other factors is zero."

2) Each observation [X(i), Y(i)] for i = 1, ....n, is independent and identically distributed

(i.i.d.): "The assumption is that [X(i), Y(i)], i = 1, ..., n, are independently and identically distributed (i.i.d.) across observations. This is a statement about how the sample is drawn. If the observations are drawn by simple random sampling from a single large population, then [X(i), Y(i)], i = 1, ..., n are i.i.d. ... The i.i.d. assumption is a reasonable one for many data collection schemes. For example, survey data from a randomly chosen subset of the population typi-cally can be treated as i.i.d."

3) Large outliers are unlikely; i.e., X and Y have nonzero finite kurtosis: "The assumption is

that large outliers— that is, observations with values of X(i), Y(i) , or both that are far outside the usual range of the data— are unlikely. Large outliers can make OLS regression results misleading. In this book, the assumption that large outliers are unlikely is made mathematically precise by assuming that X and Y have nonzero finite fourth moments ... Another way to state this assumption is that X and Y have finite kurtosis."


Question 216: Regression sums of squares: ESS, SSR, and TSS

AIMs: Define and interpret the explained sum of squares (ESS), the total

sum of squares (TSS), the sum of squared residuals (SSR), the standard

error of the regression (SER), and the regression R^2.

216.1. For the last three years, we regressed monthly dollar change in gasoline prices (regressand; dependent) against the monthly change in oil prices (regressor; independent). The number of observations (n) is therefore 36. If the coefficient of determination (R^2) is 0.18 and the total sum of squares (TSS) is 3.23 dollars^2, what is the standard error of the regression (SER)?

a) $0.28 b) $0.42 c) $2.65 d) $3.23

216.2. We regressed daily returns of a stock (the regressand or dependent variable) against a market index (e.g., S&P 1500; regressor or independent variable). The regression produced a beta for the stock, with respect to the market index, of 1.050. The stock's volatility was 30.0% and the market's volatility was 20.0%. If the regression's total sum of squares (TSS) is 0.300, what is the regression's explained sum of squares (ESS)?

a) 0.0960 b) 0.1470 c) 0.4900 d) 1.2500

216.3. A five-year regression of monthly cotton price changes, such that the number of observations (n) equals 60, against average temperature changes produced a standard error of the regression (SER) of $1.20. If the total sum of squares (TSS) was $90.625 dollars^2 , what is the implied correlation coefficient?

a) 0.08 b) 0.16 c) 0.28 d) 0.77


Answers: 216.1. A. $0.28 As R^2 = 1 - SSR/TSS, SSR = (1-R^2)*TSS. In this case, SSR = (1-0.18)*3.23 = 2.6486 dollars^2. SER = SQRT[SSR/(n-df)], where the df here is 2 because we have 2 coefficients (or 2 variables, if you like). Then, SER = SQRT(2.6486/34) = $0.279; i.e., SER units are same as the dependent variable 216.2. B. 0.1470 As beta (stock, index) = covariance(stock,index)/variance(index) = correlation(stock, index)*volatility(stock)/volatility(index), it follows that: correlation(stock, index) = beta (stock, index)*volatility(index)/volatility(stock); in this case, correlation(stock, index) = 1.050*20%/30% = 0.70, and: R^2 = correlation^2 = 0.70^2 = 0.49. Since R^2 = ESS/TSS, ESS = R^2*TSS. In this case, ESS = 0.49*0.30 = 0.1470 216.3. C. 0.28 As SER = SQRT[SSR/(n-df)], SSR = SER^2*(n-df). In this case (again, 2 coefficients = 2 df): SSR = 1.20^2*(60-2) = 83.52; R^2 = ESS/TSS = 1 - SSR/TSS = 1 - 83.52/90.625 = 0.07840 correlation = SQRT(0.07840) = 0.280


Question 217: Regression coefficients (Stock & Watson)

AIMs: Define, calculate, and interpret confidence intervals for regression

coefficients. Define and interpret hypothesis tests about regression

coefficients.

217.1. We regressed a security's returns, S(i), against market index returns, M(i), in order to estimate the security's beta according to R(i) = intercept + beta*M(i). The sample size is 48. The regression output is: R(i) = 0.020 + 1.080*M(i). The standard error of the intercept, SE(intercept), is 0.030; the standard error of the beta, SE(beta), is 0.050. The two-sided null hypothesis is that the security's beta is one; i.e., the null is beta = 1.0. Do we reject the null at 95% confidence?

a) No, the t-statistic is 1.60 b) No, the t-statistic is 21.60 c) Yes, the t-statistic is 5.85 d) Yes, the t-statistic is 21.60

217.2. Assuming the relationship that Earnings = B(0) + B(1)*YearsEducation, hourly earnings ("Earnings" is the regressand or dependent variable) are regressed against years of education ("YearsEducation" is the regressor or independent variable). The OLS regression estimates are given by: Earnings = $3.80 + 2.10*YearsEducation. The standard errors are, SE[B(0)] = 1.62 and SE[(B1)] = 0.28. What is the 95% confidence interval for the average hourly increase for each additional year of education; i.e., what is the confidence interval for the slope coefficient?

a) 1.38 < B(1) < 2.82 b) 1.44 < B(1) < 2.76 c) 1.55 < B(1) < 2.65 d) 1.64 < B(1) < 2.56

217.3. Let (X) represent a binary variable where either X = 1 if an obligor has a speculative-grade credit rating, or X = 0 if an obligor has an investment-grade credit rating (S&P BBB- or Moody's Baa3 or higher). We assume this regression, R(i) = B(0) + B(1)*X(i), such that returns, R(i), are greater for speculative-grade bonds. The resulting OLS estimate is given by: R(i) = 0.040 + 0.050*X(i). The standard errors are: SE[B(0)] = 0.060 and SE[B(1)] = 0.010. The two-sided null hypothesis is that credit rating has no impact on returns. With 95% confidence, do we reject the null?

a) No, the t-statistic is 0.050 b) No, the t-statistic is 1.050 c) Yes, the t-statistic is 2.0 d) Yes, the t-statistic is 5.0


Answers: 217.1. A. No, the t-statistic is 1.60 t-statistic = (1.08 - 1.0)/0.05 = 1.60. As the t-statistic does not exceed the two-sided 95% critical value of 1.96, we do not reject the null; i.e., the population beta may be 1.0. Sample size is not required, it informs the given standard error. We only need to see that the sample is large to realize the t-statistic is approximately normal. Please note that, if the null hypothesis were "the slope is zero," then the t-statistic is (1.08 - 0)/0.05 = 21.60, and we would reject that null. 217.2. C. 1.55 < B(1) < 2.65 95% CI for the slope coefficient = B1 +/- 1.96*SE[B(1)]. In this case, 95% CI = 2.10 +/- 1.96*0.28 = 1.55 to 2.65; i.e., 1.55 < B1 < 2.65 217.3. D. Yes, the t-statistic is 5.0 T-statistic = B1/SE[B(1)] = 0.050/0.010 = 5.0, which exceeds the critical value of 1.96


Question 218: Theory of Ordinary Least Squares (Stock & Watson)

AIMs: Define and differentiate between homoskedasticity and

heteroskedasticity. Describe the implications of homoskedasticity and

heteroskedasticity. Explain the Gauss-Markov Theorem and its limitations,

and alternatives to the OLS.

218.1. We want to regress hourly Earnings (the regressand) against years of Education (the regressor) based on the following OLS regression model: Earnings(i) = B(0) + B(1)*Education(i) + u(i), where u(i) is the error term. After we run the regression, which of the following statement MOST NEARLY demonstrates homoskedasticity?

a) Education(i) is not a linear function of any other regressor b) Earnings(i) is independent of Education(i) c) The variance of the error, u(i), is independent of Education(i) d) The error term has a conditional mean of zero, E[u(i) | Education(i)] = 0

218.2. Assume we have confirmed that all three of Stock & Watson's assumptions are true for our OLS linear regression model; i.e., the error term has a mean of zero conditional on the regressor; the [X(i),Y(i)] observations are i.i.d. random draws; and large outliers are unlikely. Our OLS regression model is: Y(i) = B(0) + B(1)*X(i) + u(i). Each of the following is true EXCEPT for:

a) Whether the errors are homo- or heteroskedastic, the OLS estimators are are unbiased, consistent and asymptotically normal

b) If the errors are heteroskedastic, we can compute heteroskedasticity-robust standard errors

c) If it is true that, in addition to the three assumptions above, that the errors are homoskedastic, then our OLS estimator for B(1) is BLUE

d) As heteroskedasticity is a special case of homoskedasticity, and given that homoskedasticity is more most prevalent, the safest practice is to employ homoskedasticity-robust standard errors


218.3. You presented a regression model to your boss, the Chief Risk Officer (CRO). She is a certified FRM so you know that she knows statistics, although she laments the decision to replace rigorous Gujarati with a softer, gentler Stock & Watson. She queries you on the dataset and your regression, and you admit to two realities: First, the error term is heteroskedastic. Second, there are many extreme outliers in the dataset. Your boss makes the following assertions:

I. "It is okay, for our purposes, that the error term is heteroskedastic: the slope (B1) estimator remains efficient and BLUE."

II. "Since we have many extreme outliers, the least absolute deviations (LAD) is a viable alternative to OLS, because its estimators may be more efficient (i.e., have smaller variances)"

Which of your boss' statements is (are) true?

a) Neither b) I. only c) II. only d) Both are true

Answers: 218.1. C. The variance of the error, u(i), is independent of (does not depend on) Education(i) "The error term u(i) is homoskedastic if the variance of the conditional distribution of u(i) given X(i) [in this case, Education(i)] is constant for and in particular does not depend on [the regressor; the independent variable] . Otherwise, the error term is heteroskedastic ... Homoskedasticity means that the variance of u(i) is unrelated to the value of [the regressor; the independent variable]. Heteroskedasticity means that the variance of u(i) is related to the value of [the regressor; the independent variable]." In regard to (A), there are no other regressors; but, if there were, this would refer to multicollinearity. In regard to (B), this is contrary to the model itself. In regard to (D), this is an OLS assumption!


218.2. D. The reverse. Homoskedasticity is the special case of heteroskedasticity; heteroskedasticity-robust standard errors are robust to homoskedasticity but not the converse. Stock & Watson: "Practical implications. The main issue of practical relevance in this discussion is whether one should use heteroskedasticity- robust or homoskedasticity- only standard errors. In this regard, it is useful to imagine computing both, then choos-ing between them. If the homoskedasticity- only and heteroskedasticity- robust standard errors are the same, nothing is lost by using the heteroskedasticity- robust standard errors; if they differ, however, then you should use the more reliable ones that allow for heteroskedasticity. The simplest thing, then, is always to use the heteroskedasticity- robust standard errors. For historical reasons, many software programs report homoskedasticity- only standard errors as their default setting, so it is up to the user to specify the option of heteroskedasticity- robust standard errors. The details of how to implement het-eroskedasticity- robust standard errors depend on the software package you use. All of the empirical examples in this book employ heteroskedasticity- robust standard errors unless explicitly stated otherwise." In regard to (A), (B) and (C), EACH is TRUE. "The Gauss-Markov Theorem for B(1): If the three least squares assumptions hold and if errors are homoskedastic, then the OLS estimator is the Best (most efficient) Linear conditionally Unbiased Estimator (is BLUE)" 218.3. C. II. only In regard to (I), the "B" in BLUE refers to "best" which means most EFFICIENT (smallest variance among unbiased estimators); heteroskedasticity threatens the efficiency of the estimator. Stock and Watson: "The Gauss-Markov theorem provides a theoretical justification for using OLS. However, the theorem has two impor-tant limitations. First, its conditions might not hold in practice. In particular, if the error term is heteroskedastic--as it often is in economic applications--then the OLS estimator is no longer BLUE. As discussed in Section 5.4, the presence of heteroskedasticity does not pose a threat to inference based on heteroskedasticity-robust standard errors, but it does mean that OLS is no longer the efficient linear condi-tionally unbiased estimator. An alternative to OLS when there is heteroskedasticity of a known form, called the weighted least squares estimator, is discussed below. The second limitation of the Gauss-Markov theorem is that even if the conditions of the theorem hold, there are other candidate estimators that are not linear and conditionally unbiased; under some conditions, these other estimators are more efficient than OLS. ... If extreme outliers are not rare [i.e., common or not uncommon], then other estimators can be more efficient than OLS and can produce inferences that are more reliable. One such estimator is the least absolute deviations ( LAD)."


Question 219: Omitted variable bias

AIMs: Define, interpret, and discuss methods for addressing omitted

variable bias. Distinguish between simple and multiple regression. Define

and interpret the slope coefficient in a multiple regression.

219.1. We regress a stock's returns against a market index according to the following OLS model: Return(i) = B(0) + B(1)*Index(i) + u(i). However, our regression is guilty of omitted variable bias. If our regression indeed suffers from omitted variable bias, which of the following is MOST likely true?

a) The OLS assumption that E[u(i) | X(i)] = 0 is incorrect b) The OLS assumption that [X(i), Y(i)], i = 1, ..., n are i.i.d. random draws is incorrect c) The OLS assumption the large outliers are unlikely is incorrect d) The assumption of no perfect multicollinearity is incorrect

219.2. Our multiple regression model regresses a (dependent) credit score against two (independent) regressors, Leverage and CashFlow. This multivariate OLS model is given by: Score(i) = B(0) + B(1)*Leverage(i) + B(2)*CashFlow(i) + u(i). Next, we omit CashFlow and only use a single regressor according to: Score(i) = B'(0) + B'(1)*Leverage(i) + u(i). Under which of the following conditions will there be omitted variable bias?

a) If Score(i) and Leverage(i) are correlated b) If Leverage(i) and CashFlow(i) are correlated c) If CashFlow(i) is a dummy variable d) If Score(i) is independent of CashFlow(i)

219.3. Data were collected from a random sample of 220 home sales from a community in 2003. Let 'Price' denote the selling price (in $1000), 'BDR' denote the number of bedrooms, 'Bath' denote the number of bathrooms, 'Hsize' denote the size of the house (in square feet), 'Lsize' denote the lot size (in square feet), 'Age' denote the age of the house (in years), and 'Poor' denote a binary variable that is equal to 1 if the condition of the house is reported as “ poor.” An estimated regression yields Price = 119.2 + 0.485*BDR + 23.4*Bath + 0.156*HSize + 0.002*LSize + 0.090*Age - 48.8*Poor Suppose that a homeowner adds a new bathroom to her house, which increases the size of the house by 100 square feet. What is the expected increase in the value of the house? [Source: S&W Question 6.5.b.]

a) Zero b) $17,500 c) $28,350 d) $39,000


Answers: 219.1. A. The OLS assumption that E[u(i) | X(i)] = 0 is incorrect Stock & Watson: "Omitted variable bias occurs when two conditions are true: (1) when the omitted variable is a determinant of the dependent variable and (2) when the omitted variable is a determinant of the dependent variable. ... Omitted variable bias means that the first least squares assumption--that E[u(i) | X(i)] = 0, as listed in Key Concept 4.3— is incorrect. To see why, recall that the error term u(i) in the linear regression model with a single regressor represents all factors, other than X(i), that are determinants of Y(i). If one of these other factors is correlated with X(i), this means that the error term (which contains this factor) is correlated with X(i). In other words, if an omitted variable is a determinant of Y(i), then it is in the error term, and if it is correlated with X(i), then the error term is correlated with X(i). Because u(i) and X(i) are correlated, the conditional mean of u(i) given X(i) is nonzero. This correlation therefore violates the first least squares assumption, and the consequence is serious: The OLS estimator is biased. This bias does not vanish even in very large samples, and the OLS estimator is inconsistent." 219.2. B. If Leverage(i) and CashFlow(i) are correlated (the less intuitive of the TWO conditions for omitted bias). Stock & Watson: "Omitted variable bias occurs when two conditions are true:

1) 1.When the omitted variable [in this case, CashFlow(i)] is correlated with the included regressor [in this case, Leverage(i)] and and

2) 2.When the omitted variable [CashFlow(i)] is a determinant of the dependent variable [Score(i)]."

219.3. D. $39,000 Change in BDR = +1 and Change in HSize = +100, such that expected change in price is 23.4*1 + 0.156*100 = 39.0 thousand dollars = +$39,000


Question 220: OLS estimators in a multiple regression

AIMs: Define, calculate, and interpret measures of fit in multiple

regression. Explain the concept of imperfect and perfect multicollinearity

and their implications.

220.1. Each of the following is true about the adjusted R^2 EXCEPT which is false?

a) Adjusted R^2 = 1 - (SSR/TSS)*[(n-1)/(n-k-1)] b) Adding a regressor (independent variable) always causes the adjusted R^2 to decrease c) Adjusted R^2 is always less than R^2 d) The adjusted R^2 can be negative

220.2. A multiple regression model, on a small sample of monthly returns for one year, has two regressors and is given by: Y(i) = 10.0 + 1.46*X(1,i) - 0.82*X(2,i) + u(i). The number of observations (n) is 12. The sum of squared residuals (SSR) is 106.0. The total sum of squares (TSS) is 166.0. What are, respectively, the standard error of the regression (SER) and the adjusted R^2?

a) SER = 0.89 and Adjusted R^2 = -0.11 b) SER = 2.25 and Adjusted R^2 = 0.64 c) SER = 3.43 and Adjusted R^2 = 0.22 d) SER = 11.87 and Adjusted R^2 = 0.64

220.3. With respect to a linear regression with multiple regressors, each of the following is true EXCEPT which statement is false:

a) Imperfect multicollinearity implies that we cannot estimate precisely ANY of the partial effects (slope coefficients)

b) Imperfect multicollinearity means that two or more of the regressors are highly correlated

c) In contrast to perfect muticollinearity, imperfect multicollinearity it is not necessarily an error but likely just a feature of the OLS

d) The dummy variable trap is an example of perfect multicolliinearity


Answers: 220.1. B. Added a regressor has an unclear impact on the adjusted R^2 (however, the adjusted R^2 is always LESS THAN the R^2). In regard to (A), (C), and (D), EACH is TRUE. Stock & Watson: "There are three useful things to know about the adjusted R^2. First, (n-1)/(n-k-1) is always greater than 1, so adjusted R^2 is always less than R^2. Second, adding a regressor has two opposite effects on the adjusted R^2. On the one hand, the SSR falls, which increases the adjusted R^2. On the other hand, the factor (n-1)/(n-k-1) increases. Whether the adjusted R^2 increases or decreases depends on which of these two effects is stronger. Third, the adjusted R^2 can be negative. This happens when the regressors, taken together, reduce the sum of squared residuals by such a small amount that this reduction fails to offset the factor (n-1)/(n-k-1)." 220.2. C. SER = 3.43 and Adjusted R^2 = 0.22 We don't need the slope coefficients (aka, partial effects). SER = SQRT[SSR/(n-k-1)] = SQRT[106/(12-2-1)] = 3.43. Adjusted R^2 = 1 - SSR/TSS*[(n-1)/(n-k-1)] = 1 - 106/166*(11/9) = 0.22. 220.3. A. Imperfect multicollinearity implies that it will be difficult to estimate precisely one or more of the partial effects, but does necessarily challenge all of the slope coefficients. In regard to (B), (C), and (D), each is TRUE. Stock & Watson: "Imperfect multicollinearity arises when one of the regressors is very highly correlated— but not perfectly correlated— with the other regressors. Unlike perfect multicollinearity, imperfect multicollinearity does not prevent estimation of the regression, nor does it imply a logical problem with the choice of regressors. However, it does mean that one or more regression coefficients could be estimated imprecisely. ... Despite its similar name, imperfect multicollinearity is conceptually quite different from perfect multicollinearity. Imperfect multicollinearity means that two or more of the regressors are highly correlated in the sense that there is a linear function of the regressors that is highly correlated with another regressor. Imperfect multicollinearity does not pose any problems for the theory of the OLS estimators; indeed, a purpose of OLS is to sort out the independent influences of the various regressors when these regressors are potentially correlated. If the regressors are imperfectly multicollinear, then the coefficients on at least one individual regressor will be imprecisely estimated." In regard to (D):" The dummy variable trap. Another possible source of perfect multicollinearity arises when multiple binary, or dummy, variables are used as regressors ... In general, if there are G binary variables, if each observation falls into one and only one category, if there is an intercept in the regression, and if all G binary variables are included as regressors, then the regression will fail because of perfect multicollinearity. This situation is called the dummy variable trap. The usual way to avoid the dummy variable trap is to exclude one of the binary variables from the multiple regression, so only of the G binary variables are included as regressors."


Question 221: Joint null hypothesis in multiple OLS regression

AIMs: Construct, perform, and interpret hypothesis tests and confidence

intervals for: a single coefficient in a multiple regression; and for multiple

coefficients in a multiple regression. Define and interpret the F-statistic.

221.1. This question was sourced from Stock & Watson; the regression also applies to the next question. Data were collected from a random sample of 220 home sales. Price is the regressand (dependent variable) and denotes the selling price (in $1,000s). The regressors (independent variables) are: BDR is number of bedrooms, Bath is number of bathrooms, HSize is size of house in square feet, LSize is lot size (in square feet), Age is age of house (in years), and Poor is a binary variable that is equal to one (1) if the condition of the house is reported as "poor." The estimated regression, with included standard errors, is given by:

If a homeowner purchases 1,000 square feet from an adjacent lot (i.e., +1,000 to her lot size), what is the 99% confidence interval for the change in value to her house?

a) +$76 to $324 b) +$275 to $1,512 c) +$763 to $3,236 d) +1,255 to $4,871

221.2. Assume the same multiple regression as above and note the regression has six regressors. The F-statistic for omitting BDR and Age from the regression is 3.31. The following four critical value are provided to you from Table 4 in the Appendix: critical F(2 df, infinite) @ 5% = 3.00; F(2 df, infinite) @ 1% = 4.61; F(6 df, infinite) @ 5% = 2.01; F(6 df, infinite) @ 1% = 2.64. Are the coefficients BDR and Age statistically different from zero at, respectively, the 5% and 1% level?

a) No, they are not significant at either level b) They are not significant at 5%, but they are significant at 1% c) They are not significant at 1%, but they are significant at 5% d) Yes, they are significant at both 5% and 1%


221.3. In regard to hypothesis tests in a multiple regression, each of the following is true EXCEPT which of the following is false:

a) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is calculated by dividing the estimate by its standard error

b) To test the joint null hypothesis at 5% significance, we can use the t-statistic to test each coefficient (one at a time) and, if any t-statistic exceeds 1.96, we can reject the joint null

c) When testing a joint hypothesis, we should use the F-statistic and reject at least one of the hypothesis if the statistic exceeds the critical value

d) When the number of restrictions is one (q=1), the joint null hypothesis reduces to the null hypothesis on a single regression coefficient, and the F-statistic is the square of the t-statistic; e.g., the critical F(1, infinite) at 5% = 1.96^2


Answers: 221.1. C. +$763 to $3,236 The 99.0% confidence interval for the partial slope (partial effect) coefficient, LSize, is given by: 0.002 +/- 0.000480 * 2.58 = 0.000764 to 0.003236 (or, less precisely is also fine: 0.000762 to 0.003238) The effect of an 1,000 increase in lot size = +1,000 * (0.000764 to 0.003236) = +$764 to $3,236 221.2. C. They are not significant at 1%, but they are significant at 5% Although the regression has six regressors, we are restricting only two such that the correct critical values are F(2 df, infinite). As computed F-statistic of 3.31 is GREATER THAN "lookup" 5% F(2 df, infinite) of 3.00, reject null; i.e., they are significant with 95% confidence. As computed F-statistic of 3.31 is LESS THAN "lookup" 1% F(2 df, infinite) of 4.61, fail to reject null; i.e., they are not significant with 99% confidence. 221.3. B. False. Each of (A), (C) and (D) is TRUE. (A) is true: we use the t-statistic to test the significance of an INDIVIDUAL slope coefficient; however, we want the F-statistic to test the JOINT hypothesis. In regard to false (B), from Stock & Watson: "Why can’t I just test the individual coefficients one at a time? Although it seems it should be possible to test a joint hypothesis by using the usual t-statistics to test the restrictions one at a time, the following calculation shows that this approach is unreliable. Specifically, suppose that you are interested in testing the joint null hypothesis in Equation (7.6) that B(1) = 0 and B(2) = 0. Let t(1) be the t-statistic for testing the null hypothesis that B(1) = 0 and let t(2) be the t-statistic for testing the null hypothesis that B(2) = 0. What happens when you use the “one-at-a-time” testing procedure: Reject the joint null hypothesis if either or exceeds 1.96 in absolute value? Because this question involves the two random variables t(1) and t(2), answering it requires characterizing the joint sampling distribution of t(1) and t(2). As mentioned in Section 6.6, in large samples B(1) and B(2) have a joint normal distribution, so under the joint null hypothesis the t-statistics t(1) and t(20 have a bivariate normal distribution, where each t-statistic has mean equal to 0 and variance equal to 1. First consider the special case in which the t-statistics are uncorrelated and thus are independent. What is the size of the “one at a time” testing procedure; that is, what is the probability that you will reject the null hypothesis when it is true? More than 5%! ... This “ one at a time” method rejects the null too often because it gives you too many chances: If you fail to reject using the first t-statistic, you get to try again using the second. If the regressors are correlated, the situation is even more complicated. The size of the “one at a time” procedure depends on the value of the correlation between the regressors. Because the “one at a time” testing approach has the wrong size--that is, its rejection rate under the null hypothesis does not equal the desired significance level--a new approach is needed ... Fortunately, there is another approach to testing joint hypotheses that is more powerful, especially when the regressors are highly correlated. That approach is based on the F- statistic."


Question 222: Homoskedasticity-only F-statistic

AIMs: Define, calculate, and interpret the homoskedasticity-only F-

statistic. Describe and interpret tests of single restrictions involving

multiple coefficients. Define and interpret confidence sets for multiple

coefficients.

222.1. You estimate the relationship between a security's return and a market index under the assumption of homoskedasticity of the error terms. The regression output is as follows: Predicted[Return(i)] = 2.85% + 1.490*Index(i), and the standard error on the slope is 0.820. The homoskedasticity-only “overall” regression F- statistic for the hypothesis that the regression R^2 is zero is approximately?

a) 1.35 b) 1.82 c) 3.30 d) 10.90

222.2. You test the three-factor Fama-French model with a multiple OLS regression, which has three regressors: Return(i) = 1.2% + 0.38*HML + 1.23*SMB + 0.17*UMD and R^2 = 0.520, where HML is "high minus low" (book-to-market), SMB is "small minus big" (small capitalization), and UMD is "up minus down" (momentum). The number of observations, n, is 384. Then you perform a restricted regression which imposes the joint null hypothesis that the true coefficients on SMB and UMD are zero. The restricted OLS regression is given by Return(i) = 0.9% + 0.44*HML and R^2 = 0.490. Please note: as the unrestricted regression has three regressors and the restricted regression hypothesizes two of the coefficients are zero, we have unrestricted k = 3 and number of restriction (q) = 2. What is the homoskedasticity-only F-statistic?

a) 1.7 b) 3.4 c) 11.9 d) 23.3

222.3. If our multiple regression has two coefficients (two regressors), a 95% confidence set mostly likely has which shape (source: Stock & Watson quiz, modified)?

a) Rectangle b) Ellipse c) Sphere d) Parabola


Answers: 222.1. C. 3.30 The t-statistic = (1.49 - 0)/0.82 = 1.81707; The F-statistic (in this special case of a single restriction) = 1.81707^2 = 3.3018 Stock & Watson: "The F-statistic when q = 1: When q=1, the F-statistic tests a single restriction. Then the joint null hypothesis reduces to the null hypothesis on a single regression coefficient, and the F-statistic is the square of the t-statistic." 222.2. C. 11.9 F = [(unrestricted R^2 - restricted R^2)/q] / [(1 - unrestricted R^2) / (n - unrestricted k - 1)]. In this case, F = [(0.520 - 0.490)/2] / [(1 - 0.520) / (384 - 3 - 1)] = 11.8750 222.3. B. Ellipse Stock and Watson, p 229: "The confidence ellipse is a fat sausage with the long part of the sausage oriented in the lower-left/ upper- right direction. The reason for this orientation is that the estimated correlation between B(1) and is B(2) positive ..."


Question 223: Adjusted R^2 in a multiple regression

AIMs: Interpret the R^2 and adjusted-R^2 in a multiple regression. Define

and discuss omitted variable bias in multiple regressions.

223.1. Our multiple regression with four regressors returns a high R^2 of 0.72 and adjusted R^2 of 0.68. We add one regressor, for a total of five, and the OLS regression returns even an higher R^2 of 0.77 and adjusted R^2 of 0.71. We can draw the following four conclusion, each of which is true EXCEPT for:

a) We can conclude the additional (fifth) variable is significant b) We cannot conclude causality; i.e., we cannot assume the regressors are a true cause of

the dependent c) We cannot conclude there is no omitted variable bias d) We cannot know that we have the most appropriate set of regressors

223.2. A multiple regression with 165 observations (n = 165) on five regressors (k = 5) produces a standard error of the regression (SER) equal to 7.0 and a total sum of squares (TSS) equal to 26,000. What is the regression's adjusted R^2?

a) 0.33 b) 0.48 c) 0.69 d) 0.72

223.3. In a multiple regression, consider two things that might occur with respect to an omitted variable:

I. At least one of the included regressors must be correlated with the omitted variable.

II. The omitted variable must be a determinant of the dependent variable, Y. In order for omitted variable bias to arise, which of the above must be true?

a) Neither are conditions b) Only I. is a condition c) Only II. is a condition d) Both are conditions.


Answers: 223.1. A. False, we cannot conclude the added variable is significant. Stock & Watson (summary): "There are four potential pitfalls to guard against when using the R^2 or adjusted R^2: 1. An increase in the or does not necessarily mean that an added variable is statistically significant. 2. A high or does not mean that the regressors are a true cause of the dependent variable. 3. A high or does not mean that there is no omitted variable bias. 4. A high or does not necessarily mean that you have the most appropriate set of regressors, nor does a low or necessarily mean that you have an inappropriate set of regressors." 223.2. C. 0.69 As SER = SQRT[SSR/(n-k-1)], SSR = SER^2*(n-k-1). In this case, SSR = 7^2*(165-5-1) = 7,791.0 Adjusted R^2 = 1 - [(n-1)/(n-k-1)] * (SSR/TSS) = 1 - (164/159)*7791/26000 = 0.691 (as usual, this is more difficult than a typical exam question, by compounding two concepts) 223.3. D. Both are conditions. Stock and Watson, "Omitted Variable Bias in Multiple Regression: Omitted variable bias is the bias in the OLS estimator that arises when one or more included regressors are correlated with an omitted variable. For omitted vari-able bias to arise, two things must be true: 1. At least one of the included regressors must be correlated with the omitted variable; and 2. The omitted variable must be a determinant of the dependent variable, Y."

t2.ols linear regression (stock & watson)

Documents