regression models

44
Part 5: Functional Form -1/36 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

Upload: vera

Post on 25-Feb-2016

105 views

Category:

Documents


2 download

DESCRIPTION

Regression Models. Professor William Greene Stern School of Business IOMS Department Department of Economics. Regression and Forecasting Models . Part 5 – Elasticities and Functional Form. Linear Regression Models. Model building Linear models – cost functions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regression Models

Part 5: Functional Form5-1/36

Regression ModelsProfessor William GreeneStern School of Business

IOMS DepartmentDepartment of Economics

Page 2: Regression Models

Part 5: Functional Form5-2/36

Regression and Forecasting Models

Part 5 – Elasticities and Functional Form

Page 3: Regression Models

Part 5: Functional Form5-3/36

Linear Regression Models Model building

Linear models – cost functions Semilog models – growth models Logs and elasticities

Analyzing residuals Violations of assumptions Unusual data points Hints for improving the model

Page 4: Regression Models

Part 5: Functional Form5-4/36

Using and Interpreting the Model

Interpreting the linear model

Semilog and growth models

Log-log model and elasticities

Page 5: Regression Models

Part 5: Functional Form5-5/36

Statistical Cost Analysis

Output

Cost

80000700006000050000400003000020000100000

500

400

300

200

100

0

S 20.5111R-Sq 92.4%R-Sq(adj) 92.3%

Fitted Line PlotCost = 2.444 + 0.005291 Output

Generation cost ($M) and output (Millions of KWH) for 123 American electric utilities. (1970).

The units of the LHS and RHS must be the same.$M cost = b0 + b1MKWHY = $ costb0 = $ cost = 2.444 $M

b1 = $M /MKWH = 0.005291 $M/MKWH

So,…..b0 = fixed cost = total cost if MKWH = 0

b1 = marginal cost = dCost/dMKWH

b1 * MKWH = variable cost

Page 6: Regression Models

Part 5: Functional Form5-6/36

In the millennial edition of its World Health Report, in 2000, the World Health Organization published a study that compared the successes of the health care systems of 191 countries. The results notoriously ranked the United States a dismal 37th, between Costa Rica and Slovenia. The study was widely misrepresented , universally misunderstood and was, in fact, unhelpful in understanding the different outcomes across countries. Nonetheless, the result remains controversial a decade later, as policy makers argue about why the world’s most expensive health care system isn’t the world’s best.

Page 7: Regression Models

Part 5: Functional Form5-7/36

Application: WHO

WHO data on 191 countries in 1995-1999. Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education

DALE = β0 + β1EDUC + ε

Page 8: Regression Models

Part 5: Functional Form5-8/36

The (Famous) WHO Data

Page 9: Regression Models

Part 5: Functional Form5-9/36

The slope is the interesting quantity.Each additional year of education is associated with an increase of 3.611 in disability adjusted life expectancy.

Page 10: Regression Models

Part 5: Functional Form5-10/36

Page 11: Regression Models

Part 5: Functional Form5-11/36

Page 12: Regression Models

Part 5: Functional Form5-12/36

Page 13: Regression Models

Part 5: Functional Form5-13/36

Increase of per capita GDP of 1,000 PPP units is associated with an increase of ‘happy’ of 1000(.0018566) = 1.86. Happy ranges from 0 to 100.

Page 14: Regression Models

Part 5: Functional Form5-14/36

Semilog Models and Growth Rates

YEARS

LogS

alar

y

302520151050

11.5

11.0

10.5

10.0

9.5

S 0.154111R-Sq 86.4%R-Sq(adj) 86.1%

Fitted Line PlotLogSalary = 9.841 + 0.04998 YEARS

LogSalary = 9.84 + 0.05 Years + e

Conclude : The slope is the growth rate per period or theproportional increase for a 1 unit change in the "x."

Page 15: Regression Models

Part 5: Functional Form5-15/36

Salary = eYears = 0 at Starting Salary

Salary = e = 18,770Marginal change. From yeart to year t+1, log Salary goesup by 0.05. Salary changes

from e to eSay

9.84+0.05Y 9.84+0.05(Y+1)

9.84+0.05Years

9.84

we go from year 10 to year 11.

Salary goes from e to eor 30,946.03 to 32532.67 which is anincrease of 5.12%. Will be the same forany year to the next year.

9.84+0.05(10) 9.84+0.05(11)

Page 16: Regression Models

Part 5: Functional Form5-16/36

Semilog Model for Fuel Bills

ROOMS

logF

uel

111098765432

7.5

7.0

6.5

6.0

5.5

Scatterplot of logFuel vs ROOMS

Each increase of 1 room raises the fuel bill by about 21%. [Actually closer to exp(.215)-1 = 24%.]

Page 17: Regression Models

Part 5: Functional Form5-17/36

Using Semilog Models for Trends

MonthFli

ghts

80706050403020100

350

300

250

200

150

100

50

0

Scatterplot of Flights vs Month

Frequent Flyer Flights for 72 Months. (Text, Ex. 11.1, p. 508)

Page 18: Regression Models

Part 5: Functional Form5-18/36

Regression Approach logFlights = β0 + β1 Months + ε b0 = 2.770, b1 = 0.03710, s = 0.06102

Month

LogF

light

s

80706050403020100

6.0

5.5

5.0

4.5

4.0

3.5

3.0

S 0.247017R-Sq 90.9%R-Sq(adj) 90.8%

Fitted Line PlotLogFlights = 2.770 + 0.03710 Month

Page 19: Regression Models

Part 5: Functional Form5-19/36

Elasticity and Loglinear Models logy = β0 + β1logx + ε The “responsiveness” of one variable to changes

in another E.g., in economics

demand elasticity = (%ΔQ) / (%ΔP) Math: Ratio of percentage changes

%ΔQ / %ΔP = {100%[(ΔQ )/Q] / {100%[(ΔP)/P]} Units of measurement and the 100% fall out of this eqn. Elasticity = (ΔQ/ΔP)*(P/Q) Elasticities are units free

Page 20: Regression Models

Part 5: Functional Form5-20/36

Monet Regression

Page 21: Regression Models

Part 5: Functional Form5-21/36

Page 22: Regression Models

Part 5: Functional Form5-22/36

Page 23: Regression Models

Part 5: Functional Form5-23/36

Using the Residuals How do you know the model is “good?” Various diagnostics to be developed

over the semester. But, the first place to look is at the

residuals.

Page 24: Regression Models

Part 5: Functional Form5-24/36

Residuals Can Signal a Flawed Model

Standard application: Cost function for output of a production process.

Compare linear equation to a quadratic model (in logs)

(123 American Electric Utilities)

Page 25: Regression Models

Part 5: Functional Form5-25/36

Electricity Cost Function

Page 26: Regression Models

Part 5: Functional Form5-26/36

Candidate Model for CostLog c = a + b log q + e

Page 27: Regression Models

Part 5: Functional Form5-27/36

A Better Model?

Log Cost = α + β1 logOutput + β2 [logOutput]2 + ε

Page 28: Regression Models

Part 5: Functional Form5-28/36

Candidate Models for CostThe quadratic equation is the appropriate model.

Logc = b0 + b1 logq + b2 log2q + e

Page 29: Regression Models

Part 5: Functional Form5-29/36

Missing Variable Included

logOutputRe

sidua

l121086420

0.50

0.25

0.00

-0.25

-0.50

Residuals Versus logOutput(response is logCost)

logOutput

Resid

ual

121086420

2.0

1.5

1.0

0.5

0.0

-0.5

-1.0

Residuals Versus logOutput(response is logCost)

Residuals from the quadratic cost model

Residuals from the linear cost model

Page 30: Regression Models

Part 5: Functional Form5-30/36

Unusual Data Points

Domestic

Over

seas

6005004003002001000

1400

1200

1000

800

600

400

200

0

S 73.0041R-Sq 52.2%R-Sq(adj) 52.1%

Regression of Foreign Box Office on DomesticOverseas = 6.693 + 1.051 Domestic

Outliers have (what appear to be) very large disturbances, ε

The 500 most successful movies

Page 31: Regression Models

Part 5: Functional Form5-31/36

Outliers

Domestic

Over

seas

6005004003002001000

1400

1200

1000

800

600

400

200

0

S 73.0041R-Sq 52.2%R-Sq(adj) 52.1%

Regression of Foreign Box Office on DomesticOverseas = 6.693 + 1.051 Domestic

Remember the empirical rule, 99.5% of observations will lie within mean ± 3 standard deviations? We show (b0+b1x) ± 3se below.)

Titanic is 8.1 standard deviations from the regression!Only 0.86% of the 466 observations lie outside the bounds. (We will refine this later.)

These points might deserve a closer look.

Page 32: Regression Models

Part 5: Functional Form5-32/36

logPrice = b0 + b1 logArea + e

Prices paid at auction for Monet paintings vs. surface area (in logs)

Not an outlier: Monet chose to paint a small painting. Possibly an outlier: Why was the price so low?

Page 33: Regression Models

Part 5: Functional Form5-33/36

What to Do About Outliers

(1) Examine the data(2) Are they due to mismeasurement error or obvious

“coding errors?” Delete the observations.(3) Are they just unusual observations? Do nothing. (4) Generally, resist the temptation to remove outliers.

Especially if the sample is large. (500 movies is large.)

(5) Question why you think it is an outlier. Is it really?

Page 34: Regression Models

Part 5: Functional Form5-34/36

Regression Options

Page 35: Regression Models

Part 5: Functional Form5-35/36

Minitab’s Opinions

Minitab uses ± 2S to flag “large” residuals.

i

Influential observationshave very large values of | x - x | .

Page 36: Regression Models

Part 5: Functional Form5-36/36

On Removing Outliers Be careful about singling out particular

observations this way.The resulting model might be a product of your opinions, not the real relationship in the data.

Removing outliers might create new outliers that were not outliers before.

Statistical inferences from the model will be incorrect.

Page 37: Regression Models

Part 5: Functional Form5-37/36

Page 38: Regression Models

Part 5: Functional Form5-38/36

Page 39: Regression Models

Part 5: Functional Form5-39/36

Page 40: Regression Models

Part 5: Functional Form5-40/36

Page 41: Regression Models

Part 5: Functional Form5-41/36

Page 42: Regression Models

Part 5: Functional Form5-42/36

Correlation?

Page 43: Regression Models

Part 5: Functional Form5-43/36

Page 44: Regression Models

Part 5: Functional Form5-44/36