2 simple regression

31
1 Using Regression to Using Regression to Describe Relationships in Describe Relationships in Data (Ch 3) Data (Ch 3) A. A. Simple Regression ** Simple Regression ** B. B. Multiple Regression Multiple Regression

Upload: georgettenicolaides

Post on 23-Jun-2015

154 views

Category:

Education


1 download

TRANSCRIPT

Page 1: 2   simple regression

11

Using Regression to Describe Using Regression to Describe Relationships in Data (Ch 3)Relationships in Data (Ch 3)

A.A. Simple Regression **Simple Regression **

B.B. Multiple RegressionMultiple Regression

Page 2: 2   simple regression

22

Using Simple Regression to Using Simple Regression to Describe a RelationshipDescribe a Relationship

Regression analysisRegression analysis is a statistical technique used is a statistical technique used to describe relationships among variables.to describe relationships among variables.

The simplest case is one where a The simplest case is one where a dependent dependent (response) variable(response) variable yy may be related to an may be related to an independentindependent ( (explanatory, causal) variable explanatory, causal) variable x.x.

The equation expressing this relationship is the The equation expressing this relationship is the line:line:

xbby 10

Page 3: 2   simple regression

33

Algebra refresherAlgebra refresher

Page 4: 2   simple regression

44

Graph of An Exact Relationship Graph of An Exact Relationship y = 1 + 2xy = 1 + 2x

654321

13

8

3

x

y

xx yy

11 33

22 55

33 77

44 99

55 1111

66 1313

Page 5: 2   simple regression

What about this?What about this?

55

Page 6: 2   simple regression

66

Error in the RelationshipError in the Relationship

In real life, we usually do not have In real life, we usually do not have exact relationships.exact relationships.

Figure 3.2 shows a situation where Figure 3.2 shows a situation where the the yy and and xx have a strong tendency have a strong tendency to increase together but it is not to increase together but it is not perfect.perfect.

^̂ A good guess might be A good guess might be y = 1 + 2.5xy = 1 + 2.5x

Page 7: 2   simple regression

77

Graph of a Relationship That is NOT ExactGraph of a Relationship That is NOT Exact

xx yy

11 33

22 22

33 88

44 88

55 1111

66 1313654321

12

7

2

x

y

S = 1.48324 R-Sq = 90.6 % R-Sq(adj) = 88.2 %

y = -0.2 + 2.2 x

Regression Plot

Let us assume this relationshipis approximatelyy = 1 +1.25x

Page 8: 2   simple regression

88

ResidualsResiduals

Residuals = errors = deviationsResiduals = errors = deviations For any point, its residual = For any point, its residual = ê =ê = y – y –

y-haty-hat We want them as small as possible… We want them as small as possible…

why?why? Can be positive or negativeCan be positive or negative

Page 9: 2   simple regression

99

654321

12

7

2

x

yS = 1.48324 R-Sq = 90.6 % R-Sq(adj) = 88.2 %

y = -0.2 + 2.2 x

Regression PlotFigure 3.3 Deviations From the LineFigure 3.3 Deviations From the Line

- deviations

+ deviations

Page 10: 2   simple regression

1010

Computation Ideas (1)Computation Ideas (1)

We can search for a line that We can search for a line that minimizes the sum of the residuals:minimizes the sum of the residuals:

While this is a good idea, it can be While this is a good idea, it can be shown that shown that anyany line passing through line passing through the point (the point (x, yx, y) will have this sum = ) will have this sum = 0.0.

)ˆ(1

i

n

ii yy

Page 11: 2   simple regression

1111

Computation Ideas (2)Computation Ideas (2)

We can work with absolute values and We can work with absolute values and search for a line that minimizes:search for a line that minimizes:

Such a procedure—called LAV or Such a procedure—called LAV or least least absolute valueabsolute value regression—does regression—does exist but usually is found only in exist but usually is found only in specialized software.specialized software.

|ˆ|1

i

n

ii yy

Page 12: 2   simple regression

1212

Computation Ideas (3)Computation Ideas (3)

By far the most popular approach is to By far the most popular approach is to square the residuals and minimize:square the residuals and minimize:

This procedure is called This procedure is called least squaresleast squares and is widely available in software. and is widely available in software. It uses calculus to solve for the It uses calculus to solve for the bb0 0

and and bb11 terms and gives a unique terms and gives a unique solution.solution.

2

1

)ˆ( i

n

ii yy

Page 13: 2   simple regression

1313

Least Squares EstimatorsLeast Squares Estimators

There are several formula for the There are several formula for the bb11 term. If doing it by hand, we might term. If doing it by hand, we might want to use:want to use:

_ __ _ The intercept is The intercept is bb00 = y – b = y – b11 x x

n

i

n

iii

n

i

n

i

n

iiiii

xn

x

yxn

yxb

1

2

1

2

1 1 11

1

1

Page 14: 2   simple regression

1414

Figure 3.5 Figure 3.5 Computations Computations

RequiredRequiredfor for bb1 1 and and bb00

xxii yyii xxii22 xxiiyyii

11 33 11 33

22 22 44 44

33 88 99 2424

44 88 1616 3232

55 1111 2525 5555

66 1313 3636 7878

2121 4545 9191 196196Totals

Page 15: 2   simple regression

1515

CalculationsCalculations

n

i

n

iii

n

i

n

i

n

iiiii

xn

x

yxn

yxb

1

2

1

2

1 1 11

1

1

__ __bb00 = y – b = y – b11 x = x =

Page 16: 2   simple regression

1616

The Unique MinimumThe Unique Minimum

The line we obtained was:The line we obtained was:

This is the best (with smallest error) This is the best (with smallest error) equation.equation.

We guessed We guessed y = 1 + 2.5x y = 1 + 2.5x on slide 7on slide 7

xy 2.22.0ˆ

Page 17: 2   simple regression

1717

We statistical software!

Page 18: 2   simple regression

1818

Examples of Regression as a Examples of Regression as a Descriptive TechniqueDescriptive Technique

SU is concerned about the cost of adding SU is concerned about the cost of adding new computers to an existing network. new computers to an existing network. They obtained data on 14 existing campus They obtained data on 14 existing campus computer labs.computer labs.

They did a regression of cost in $’s v. the They did a regression of cost in $’s v. the number of computers.number of computers.

Page 19: 2   simple regression

1919

Pricing a Computer NetworkPricing a Computer Network

yy [Cost] = 16594 + 650 [#computers]

Page 20: 2   simple regression

2020

Interpreting the equation in words…Interpreting the equation in words…

Slope:Slope: – on average, each additional computer costs on average, each additional computer costs

$650$650..– Or – The cost of the project increases by $650 for Or – The cost of the project increases by $650 for

each additional computer.each additional computer. Intercept:Intercept:

– Must meet all of the following conditions:Must meet all of the following conditions:– Fixed cost? Fixed cost? – Did we collect data at x = 0?Did we collect data at x = 0?– Does it make practical sense to build a network Does it make practical sense to build a network

of 0 computers?of 0 computers? Prediction:Prediction:

– on average, the cost for adding 10 computers is on average, the cost for adding 10 computers is $23,094$23,094 ($16594 + $650 x 10) ($16594 + $650 x 10)

yy [Cost] = 16594 + 650 [#computers]

Page 21: 2   simple regression

2121

Ex. Estimating Residential Real Estate Ex. Estimating Residential Real Estate ValuesValues

The Tarrant County Appraisal District The Tarrant County Appraisal District uses data such as house size, uses data such as house size, location and depreciation to help location and depreciation to help appraise property.appraise property.

Here we look at how appraisal value Here we look at how appraisal value ($’s) depends on size (sq feet) for a ($’s) depends on size (sq feet) for a set of 100 homes. set of 100 homes.

The data are from 1990.The data are from 1990.

Page 22: 2   simple regression

2222

4500350025001500500

300000

200000

100000

0

SIZE

VA

LU

ETarrant County Real EstateTarrant County Real Estate

^̂yy [Value] = -50035 + 72.8 [sq feet]

Page 23: 2   simple regression

2323

Interpreting the equation in wordsInterpreting the equation in words

Slope:Slope: – On average, each On average, each

additional square foot additional square foot increases the appraisal increases the appraisal value of a house by value of a house by $72.80.$72.80.

– Better --Better -- on average, each on average, each additional 100 sq feet additional 100 sq feet raises the appraisal value raises the appraisal value of a house by of a house by $7,280$7,280..

– Or -- Or -- The appraisal value The appraisal value of a house rises by about of a house rises by about $7280 for each additional $7280 for each additional 100 square feet.100 square feet.

yy [Value] = -50035 + 72.8 [sq feet]

Page 24: 2   simple regression

Interpreting the equation in wordsInterpreting the equation in words

Intercept:Intercept: – Is this the “fixed appraisal value?”Is this the “fixed appraisal value?”– Is the intercept within the range? Is the intercept within the range?

– A house with zero square feet???A house with zero square feet???

Prediction:Prediction:– The value of a 1,500 square foot house is The value of a 1,500 square foot house is $ 59,165$ 59,165 (- (-

50035 + 72.8 x 1500) on average.50035 + 72.8 x 1500) on average.

2424

Page 25: 2   simple regression

2525

Ex. Forecasting Housing StartsEx. Forecasting Housing Starts

Here we analyze the relationship between Here we analyze the relationship between US US housing starts and mortgage rates. and mortgage rates. The rate used is the US average for new The rate used is the US average for new home purchases.home purchases.

Annual data from 1963 to 2002 is used.Annual data from 1963 to 2002 is used.

Page 26: 2   simple regression

2626

15105

2400

2200

2000

1800

1600

1400

1200

1000

RATES

ST

AR

TS

US Housing StartsUS Housing Starts^̂yy [starts] = 1726 - 22.2 [rates]

Page 27: 2   simple regression

2727

……does a relationship exist?does a relationship exist?

the plot shows there is little the plot shows there is little relationship in these data relationship in these data – Is this reliable?Is this reliable?– What might improve this model?What might improve this model?

Page 28: 2   simple regression

2828

Predict the price of a diamond based on its Predict the price of a diamond based on its carats: Price (y) v. Caratage (x)carats: Price (y) v. Caratage (x)

Coef SE Coef T PConstant -259.62591 17.31886 -14.99 2.52E-19Caratage 3721.0249 81.78588 45.497 6.75E-40

Page 29: 2   simple regression

The dependent variable is WinsThe dependent variable is Wins

Predictor Predictor Coef SE Coef T P Coef SE Coef T P Constant Constant -79.63 40.45 -1.97 0.059 -79.63 40.45 -1.97 0.059 Field Goals Made 0.04119 0.01380 2.99 0.006Field Goals Made 0.04119 0.01380 2.99 0.006

2929

Page 30: 2   simple regression

Predict # calls by CSR in call centerPredict # calls by CSR in call center

The regression equation isThe regression equation is CALLS = 13.7 + 0.744 MONTHSCALLS = 13.7 + 0.744 MONTHS

Predictor Coef SE Coef T PPredictor Coef SE Coef T P Constant 13.671 1.427 9.58 0.000Constant 13.671 1.427 9.58 0.000 MONTHS 0.74351 0.06666 11.15 0.000MONTHS 0.74351 0.06666 11.15 0.000

3030

Page 31: 2   simple regression

3131

AssignmentAssignment

Study Text 3.1 – 3.2 and lecture notesStudy Text 3.1 – 3.2 and lecture notes Study table 3.14 on page 84Study table 3.14 on page 84 We will start HW/Lab 1 together.We will start HW/Lab 1 together. A short quiz will cover the lab, text, and A short quiz will cover the lab, text, and

lecture materiallecture material Review Text Chapter 2.1-2.2 and PPt Review Text Chapter 2.1-2.2 and PPt

Review of Basic Statistics 2.1-2.2 (why? Review of Basic Statistics 2.1-2.2 (why? questions on descriptive statistics and questions on descriptive statistics and plots will be on quiz also!)plots will be on quiz also!)