método regresión

14
Bivariate Analvsis: Measures of Association WHAT YOU WILL LEARN IN THIS CHAPTEH: To give examples of the types of business questions that may be answered by ana. lyzing the association between two variables .l. T.o,list,the common procedures for measuring association and to discuss how the measUremert scale will influence the selection of statistical tests. -rf the simnle cnrelafinn cneffinicnt To discuss the concept of the simple correlation coefficient. _ , To calculate a simple correlation coefficient and a coefficient of determination, To understand that correlation does not mean causation. 'r": To interpret a correlation matrix. ':,,1,.:,,t'," To explain the concept ofbivariate linear regression. ,., To identify the intercept and slope coefficients. To discuss the least-squares method of regression analysis. t ,,t' , .,,..,,,,1 To draw a regression line. :::.:...: To test the statistical significance of a least-squares regression. ..lllr, To calculate the intercept and slope coefficients in a bivariate linear regression. ...:, To interpret analysis of variance summary tables for linear regression. ,,.1.:

Upload: elcarranza03

Post on 12-Dec-2015

224 views

Category:

Documents


0 download

DESCRIPTION

Método de regresión numérica

TRANSCRIPT

Page 1: Método Regresión

Bivariate Analvsis:Measures of Association

WHAT YOU WILL LEARN

IN THIS CHAPTEH:To give examples of the types of business questions that may be answered by ana.lyzing the association between two variables .l.

T.o,list,the common procedures for measuring association and to discuss how themeasUremert scale will influence the selection of statistical tests.

-rf the simnle cnrelafinn cneffinicntTo discuss the concept of the simple correlation coefficient. _ ,

To calculate a simple correlation coefficient and a coefficient of determination,

To understand that correlation does not mean causation. 'r":

To interpret a correlation matrix. ':,,1,.:,,t',"

To explain the concept ofbivariate linear regression. ,.,

To identify the intercept and slope coefficients.

To discuss the least-squares method of regression analysis. t ,,t' ,

.,,..,,,,1

To draw a regression line. :::.:...:

To test the statistical significance of a least-squares regression. ..lllr,

To calculate the intercept and slope coefficients in a bivariate linear regression. ...:,

To interpret analysis of variance summary tables for linear regression. ,,.1.:

Page 2: Método Regresión

CHAPTER Z2 Bivariate Ar-ralr,'sis: N4easures of Association

IiXHIBI'l' 22.2 Bivariatc Analvsis - Comnron Procedures for 'festing ,\ssociaticin

551

Measurement levela

r:::,-ij,,:,:,1]:ri:,::::::::,:::::::::::::::::::

ii ':: .:;:::::::::::::::::': : :::::::j:::::::::::i::::::::::i::::r 1 . : .: .::..::.i::': :: ..:. .

iiiiii.iiiiiiiiiiiii""' ... .,.,.......,H.iffi.1H#l.ii..h.dill$*... ::i:::::

.

Co rfe,lati sn coeff icie nt

{Pearson's r)E.iva,rf iat0, 1g,gl'gss,iO n an.a|ysis

: Chi-squareSpearman rank correlationKendall's rank correlation

::::l:::llllllllll:::::llt::lll..,l..6,h,i;squaro "" i

::::::i:i::::::i:.i::::i::::i,ii:i:i:;i:Hl liisoef{ic|efl'f''Cu,nti nge ncy coeff i cie nt

Measure of association Sample question

Are dollar salesassociated with advertising

dollar expenditures?

ls rank preterence forshopping centers associated

with Likert scale rankingof convenience of locations?

ls sex associated withbrand awareness (aware/

not aware)?

,lf at least one of the two variables has a given level of measurement, the appropriate procedure is the one with the lewest assumptions about

the data.

simple correlationcoeff icientA statistical measure of the

covariation of or association

between two variables.

SIN,{PLE CORRELATION COEFFICIENT

The most popular technique that indicates the relationship of one variable to another

is simple correlation analysis. The simple correlation coefficient is a statisticalmeasure of the covariation or association between two variables. The correlationcoefficient (r) ranges from +1.0 to -1.0. If the value of r is 1.0, there is a perfectpositive linear (straightJine) relationship. If the value of r is -1.0, a perfect negative

linear relationship or a perfect inverse relationship is indicated. No correlation is

indicated if r = 0. A correlation coefficient indicates both the magnitude of the lin-ear relationship and the direction of the relationship. For example, if we find that

the value of r = -.92, we know we have a relatively strong inverse relationship.That is, the greater the value measured by variable X the less the value measured

by variable LThe formula for calculating the correlation coefficient for two variables X

and Iis:I(X,- X)V,-Yt(" = r" = l\x,- -);,,>ff, - Yr

where the symbols X and Y represent the sample means of X and { respectively.

Page 3: Método Regresión

517 PART VI Data Analysis and Presentation

An alternative way of expressing the correlation formula is:

orvrr_ ryx_ \/oW,

where

o? = variance ofXo? = vaiance of Y

o,r' = covoriance ofX and Y

with

r(X, - Iog,- Yto" =- lg

If associated values of X, and Yr differ from their means in the same direction.

then their covariance will be positive. Covariance will be negative if the values of Xi

and Y, have a tendency to deviate in opposite directions.

EXHIBTT ?2,7

Scatter DiagramsIllustrating CorrelationPatterns

r = .30OO

f=.80 f = +1 .0

a

o'ao

a'

ooo

aa

aaaO

oa

a

Y aO

aaO

a

aaaO

'aaa

a

Oa

OO

aa

o

ooao

o

aOo

a

o

a

a

Oa

oa'

oooOO

aa

Oa

a

a

o

ao

aOo

oa

a

Low PositiveCorrelation

High PositiveCorrelation

Perfect PositiveCorrelation

f=0 r = -.60 I = -1 .0

ao

a

oooo

'a ' o

o 'aoo

OOo

a

a

o

Moderate NegativeCorrelation

Perfect NegativeCorrelation

Page 4: Método Regresión

CHAPTER 22 Bivariate An:rlvsis: N,leasures of ,{.ssociartion 557

In actuality, the simple correlation coefficient is a standardized measure of co-variance. In the formula the numerator represents covariance and the denominator isthe square root of the product of the sample variances. Researchers find the correla-tion coefficient useful because two correlations can be compared without regard tothe amount of variation exhibited by each variable separately.

Exhibit 22.3 ilhstrates the correlation coefficients and scatter diagrams for sev-eral sets of data.

An ErarnpleTo illustrate the calculation of the correlation coefficient, an investigation is made todetermine if the average number of hours worked in manufacturing industries is re-lated to unemployment. A correlation analysis on the data in Table 22.1 is used todetermine if the two variables are associated.

The correlation between the two variables is -.635, which indicates an inverse re-lationship. Thus when the number of hours worked is high, unemployment is low.This makes intuitive sense. If factories are increasing output, regular workers typi-cally work more overtime and new employees are hired (reducing the unemploy-ment rate). Both variables are probably related to overall economic conditions.

Correlation and CausationIt is important to remember that correlation does not meancausation. No matter howhighly conelated the rooster's crow is to the rising ofthe sun, the rooster does notcause the sun to rise. It has been pointed out that there is a high correlation betweenteachers' salaries and the consumption of liquor over a period of years. The approxi-mate correlation coefficient is r = .9. This high correlation does not indicate thatteachers drink, nor does it indicate that the sale of liquor increases teachers' salaries.It is more likely that both teachers' salaries and liquor sales covary because they areboth influenced by a third variable, such as long-run growth in national ircomeand./or population.

In this example relationship between the two variables is apparent but not real.Even though the variables are not causally related, they can be statistically related.

fr

-;!t

Researchers who examine

statistical relationsh i ps m ustbe aware that the variables

may not be causally related.

Page 5: Método Regresión

554

TABLtr 22.1

Correlation Analvsisof Number ofHours \\'orked inN,IanufacturingIndustries w'ithunemplor,'ment Rate

coefficient ofdetermination (r2)

A measure of that portion of

the total variance of a variable

that is accounted for by

knowing the value of another

variable.

PART VI Data Ar-ralysis and Presentatior-r

NumberUnemployment of HoursRate (X,) Worked (Y;) X,- X (X,- X), Y, -Y (Y, - Y)' (X,- X)(V, - D

5.5

4.4

4.1

4.3

6.8

5.5

5.5

6.7

5.5

5.7

5.2

4.5

3.8

3.8

3.6

3.5

4.9

5.9

5.6

0.51

-0.59

-0.89

-0.691.81

0.51

0.51

1 .71

0.51

0.71

0.21

-0.49

-1.19

-1 .19

-1 .39

-1 .49

-0.090.91

0.61

0.2601

0.3481

0.7921

0.4761

3.2761

0.260 1

0.2601

2.9241

0.2601

0.5041

0.0441

0.2401

1 .4161

1 .4161

1.9321

2.2201

0.0081

0.8281

0.3721

39.6

40.7

40.4

39.8

39.2

40.3

39.7

39.8

40.4

40.5

40.7

41 .2

41 .3

40.6

40.7

40.6

39.8

39.9

40.6

X = 4.99

Y = 40.31

-0.71 0.5041

0.39 0.1521

0.09 0.0081

-0.51 0.2601

-1 .11 1 .2321

-0.01 0.0001

-0.61 0.3721

-0.51 0.2601

0.09 0.0081

0.1 9 0.0361

0.39 0.1521

0.89 0.7921

0.99 0.9801

0.29 0.0841

0.39 0.1 521

0.29 0.0841

-0.51 0.2601

-0.41 0. 1 681

0.29 0.0841

-0.3621

-0.2301

-0.08010.3519

-2.0091

-0.0051

-0.31 1 1

-0.87210.0459

0.1 349

0.0819

-0.4361

-1 .1781

-0.3451

-0.5421

-0.43210.0459

-0.3731

0.1 769

I(X,-X)r-17.8379>(f-Y)'=5.5899

z(X,- xl(Y,- Y) - -6.338e2(X,- xlI - Y)

2(X,- x)r2(Y,- Y)'-6.3389

-6.3389=:f ge.ttz

= -.635

This can occur because both are caused by a third (or more) factor(s). When this isso, the variables are said to be spuriously related.

C oeffi cient of DeterminationIf we wish to know the proportion of variance in I explained by X (or vice versa).we can calculate the coefficient of determination by squaring the correlationcoefficient (r2):

. Explained variancet

-- Total variance

Page 6: Método Regresión

t)6 PART VI Data Analysis and Presentartion

'f ABLE 22.2 Pearson Product-Nlornent Correlation Nlatrix for Sales N'Ianagement Example,'

Variables JS GE SE OD JT RA TP WLVI

S Performance

JS Job satisfaction

GE Generalized self-esteem

SE Specific self-esteem

OD Other-directedness

Vl Verbal intelligence

JT Job-related tension

RA Role ambiguity

TP Territory potential

WL Workload

1.00

.45b 1.00

.31b .10 1.00

.61b .28b .36b

.05 -.03 -.44b

-.36b -.13 -.14_.48b _.56b _.32b

-.26" -.24" -.32b.49b .31b .04

.45b .1 1 .29"

1.00

-.24"

-.11

-.34b

-.39b.zgb

.29"

1.00

-.1 8d 1 .00

.26b -.02

.38b -.05

.09 -.09

-.04 -j2

1.00

.44b 1.00

-.38b -.26b 1.00

_.27" _.22d .4gb 1.00

"Numbers below the diagonal are for the sample. Those above the diagonal are omitted.

op < .05.

bivariate linear regressionA measure of linear

association that investigates a

straight-line relationship of the

tyOe Y: 3 * pX, where Y is

the dependent variable. X is

the independent variable, and

a and B are two constants to

be estimated.

interceptAn intercepted segment of a

iine. The point at which a

regression lrne intersects the

Y-axis.

slopeThe inclination of a regression

line as compared to a base

line. Rise (vertical distance)

over run (horizontal difference),

REGRESSION AN'ALYSIS

Regression is another technique for measuring the linear association between a de-pendent and independent variable. Although regression and correlation are mathe-matically related, regression assumes the dependent (or criterion) variable, I, ispredictively linked to the independent (or predictor) variable, X. Regression analysisattempts to predict the values of a continuous, interval-scaled dependent variablefrom the specific values ofthe independent variable. For example, the amount ofex-ternal funds required (the dependent variable) might be predicted on the basis ofsales growth rates (independent variable). Although there are numerous applicationsof regression analysis, forecasting sales is by far the most common.

The discussion here concerns bivariate linear regression. This form of regres-sion investigates a straight-line relationship of the type Y = a + 9X, where I is thedependent variable and X is the independent variable and a and B are two constantsto be estimated. The symbol a represents the I intercept and B is the slope coeffi-cient. The slope B is the change in Idue to a corresponding change in one unit ofX.The slope may also be thought of as "rise over run" (the rise in units on the I axis di-vided by the run in units along the X axis.) (The A is the notation for "a change in.",

Suppose a researcher is interested in forecasting sales for a construction distribu-tor (wholesaler) in Florida. Further, the distributor believes a reasonable associatioiexists between sales and building permits issued by counties. Using bivariate linea:regression on the data in Table 22.3, the researcher will be able to estimate sales po-tential (Y) in various counties based on the number of building permits (X).

For a better understanding of the data in Table 22.3, the data can be plotted on "scatter diagram (Exhibit 22.4).ln the diagram the vertical axis indicates the value c:the dependent variable I and the horizontal axis indicates the value of the indepen-dent variable X. Each point in the diagram represents an observation of the X and i'at a given point in time, that is, the paired values of Y arrd X. The relationshr:

Page 7: Método Regresión

..iF.l.r+lrl*r

:|:;jr:,:;,1:;:::;i;::';;

,.,,it:,f :

,\J,:::,:::::t:):a::::t:)):)

i:::fim

]ffilia.d

Regression: One Step Backward

The essence of a dictionarydefinition of the word "re-gression" is a going back

or moving backward. Thisnotion of regressing, thatthings "go back to Previousconditions," was the source

for the original concept of statistical regression. Gal-

ton, who first worked out the concept of correlation,got the idea from thinking about "regression toward

mediocrity,o' a phenomenon observed in studies of in-

heritance. "Tall men will tend to have shorter sons,

and short men taller sons. The sons' heights, then,

tend to 'regress to,' or 'go back to,' the mean of the

population. Statistically, if we want to predict Y and X

and the correlation between X and Y is zero, then our

best prediction is to the mean." (lncidentally, the sym-

bol r, used for the coefficient of correlation, was origi-

nally chosen because it stood for "regression.")

CIIAP'|ER 22 Bivrrriatte An:rlvsis: \'leasttres of Associ:rtiorr ss7

between X and Y could be "eyeballed," that is, a straight line could be drawn through

the points in the figure. However, such a line would be subject to human error. Two

researchers might draw different lines to describe the same data.

Least-Sciuares \Iethod of Regressinn .\nalvsisThe task of the researcher is to find the best means for fltting a straight line to the

data. The least-squares method is a relatively simple mathematical technique that

ensures that the straight line will best represent the relationship between X and Y.

The logic behind the least-squares technique goes as follows. No straight line can

completely represent every dot in the scatter diagram. Unless there is a perfect

least-squares rnethod

A mathematical iechnique

ensuring that the regression

line will hest represent the

linear relationship between

X and Y.

'I'atble 72.3

Relationsliil> of Salcs

Potential to RtrilclingPernrits Issrrecl

Dealer

YDealer's SalesVolume (000)

xBuildingPermits

1

2

3

4

5

6

7

II

10

11

12

13

14

15

77

79

80

83

101

117

129

120

97

106

99

121

103

86

99

86

93

95

104

139

180

165

147

119

132

126

156

129

96

108

Page 8: Método Regresión

558

EXHIBIT 22,4

Scatter Diagram andEyeball Forecast

PART VI Data Analysis and Presentation

Y

165

160

155

150

145

140

135

130

12s

120

115

110

105

100

95

90

85

80

Myline

+

t'Yourline

85 95 105 115 125 135 145 155 165 175 18s 195

X

residualThe difference between the

actual value of the dependent

variable and the estimated

value of the dependent

variable in the regression

equation.

correlation between two variables, there will be a discrepancy between most of theactual scores (each dot) and the predicted score based on the regression line. Simplystated, any straight line that is drawn will generate errors. The method of leastsquares uses the criterion of attempting to make the least amount of total error inprediction of Y from X. More technically, the procedure used in the least-squaresmethod generates a straight line, which minimizes the sum of squared deviations ofthe actual values from this predicted regression line. Using the symbol e to representthe deviations ofthe dots from the line, the least-squares criterion is:

Le? is-iri*r*where

€i=Yi- i, (the "residual")I; = actual value of the dependent variable

i, = estimated value of the dependent variable (Yhat)

n = number of observations

i = number of the observation

Page 9: Método Regresión

The symbols A and B ate

the line. Thus, to comPute

formulas:

CHAPTER 22 Bivariate Analysis: Measures of Association 559

The general equation of a straight line equals Y = a * BX, whereas a more appro-

priate equation includes an allowance for error:

Y=6+BX+e

utilized when the equation is a regression estimate ofthe estimated values of a and 9, *. use the following

A n(>xY) - (>x)(Ir)p-

and

6=V - 0X

where

0 - estimated slope of the line (the "regression coefficient")

A - estimated intercept of the y axis

Y - dependent variable

Y - mean of the dePendent variable

X - independent variable

X - mean of the independent variable

n = number of observations

Dealer Y XY

tl195

X

rf themplyleast

ror inpares

ms ofresent

TABLtr 22.4

Least-SquaresComputation

)exY

1772793804835 101

6 117

7 129

8 120

99710 106

11 99

12 121

13 103

14 86

15 99

7 = 99.8

5,929

6,241

6,400

6,889

10,201

13,689

16,641

14,400

9,409

11,236

9,801

14,641

10,609

7,396

9,801

>Y2 = 153,283

86

93

95

104

139

180

165

147't 19

132

126

156

129

96

108

2X - 1,W5

X -125

7,396

8,649

9,025

10,816

19,321

32,400

27,225

21 ,609

14,161

17,424

15,876

24,336

16,641

9,216

11,664

>X2 = 245,759

6,622

7,347

7,600

8,632

14,039

21 ,060

21 ,285

17,640

11,543

13,992

12,474

18,876

13,287

8,256

10,692

>xY= 193ffi

Page 10: Método Regresión

560 I']AR'f VI Data .\nalvsis uncl I'rcsentertion

These equations may be solved by simple arithmetic (see Tablemate the relationship between the distributor's sales to a dealer andbuilding permits, the following manipulations are performed:

0- n(ZxY) - (>))(Ir;n(2X2) - (I4'

1 5( 1 93,345.) - 2,906,975

15(215 ,l 59) - 3,5 15 ,625

2,900,115 - 2,906,,915

3,686,38s - 3,5ts,62s

_ 93,300

110,160

= .54638

h=Y - gX

= 99.8 - .54638(125)

= 99.8 - 68.3

= 31.5

The formula i' = 31.5 + 0.546X is the regression equation used for the prediction ofthe dependent variable. suppose the wholesaler considers a new dealership in anarea where the number of building permits equals 89. Sales may be forecast in thisarea as:

i'= 31.5 + .546 (n= 31.5 + .s46 (89)

= 31.5 + 48.6

= 80.1

Thus our distributor may expect sales of 80. I in this new area.sCalculation of the cor:relation coefficient gives an indication of how accurate the

predictions may be. In this example the correlation coefficient is r = .9356, and thecoefficient of determination is 12 = .8754.

i ila* ilig ii it,,:i;;rrllrri !,rrrr:To draw a regression line on the scatter diagram, only two predicted values of I-need plotting. For example, if Dealer 7 and Dealer 3 are used, t, and ?rwill be cal-culated to be 121.6 and 83.4:

Dealer 7 (actual Ivalue = 129): f', =31.5 +.546(165)

= 121.6

Dealer 3 (actual Y value = 80): I, = 31.5 + .546(95)

= 83.4

once the two Y values have been predicted, a straight line connecting the points?t = 121 .6, Xt = 165, and i, = 83.4, X1= 95 can be drawn.

Exhibit 22.5 shows the regression line. If it is desirable to determine the error (re-sidual) of any observation, the predicted value of r is flrst calculated. The predictedvalue is then subtracted fiom the actual value. For example, the actual observation

22.4). To esti-the number of

Page 11: Método Regresión

567,

trXHIBT'I'22.6Scatter f)iagranrof fhplained anclLlnerplainecl Yariation

PART VI Data Analysis ar-rd Presentatior-r

Dea

actuaY

130

120

110

100

90

80

ler B

I sales

\\ $ryvo

Yi- Y = Deviationexplained by regression

AYAX

100 140 150 170

using r, - Y; rather than { - 7. ttris is the "explained" deviation due to the regres-sion. The smaller number 8.2 is the deviation not explained by the regression.

Thus the total deviation can be partitioned into two parts:

(y,-V) =1?,-r1 +g,-?;Total Deviation Deviationdeviation = explained by + unexplained by

the regression the regression(residual error)

where

7 = mean of the total group

i = value predicted with regression equation

Yi = actual value

For Dealer 8 the total deviation is 120 - 99.8 = 20.2, the deviation explained by theregression is I 1 1.8 - 99.8 = 12, and the deviation unexplained by the regression is120 - 111.8 = 8.2. If these values are summed over all values of y,(i.e., all observa-tions) and squared, these deviations provide an estimate of the variation of r ex-plained by the regression and unexplained by the regression:

Z(y,- y), = I(r, - y), + 2(y,_ t,),Total Explained Unexplainedvariation =variation + variationexplained (residual)

we have thus partitioned the total sum of squares, ssr, into two parts: the regres-sion sum of squares, SSr, and the error sum of squares, SSe..

SSr-SSr+SSe

180160130120110

Page 12: Método Regresión

CHAPTER 22 Bivariate Analvsis: N4easures of Association

tions. The beta coefficients of some well-known com-panies, as calculated by Merrill Lynch, are shown inthe table below. Most stocks have betas in the rangeof 0.75 to 1.50, The average for all stocks is 1.0 bydefinition. A list of beta coefficients is given below:

Stock Betaalized rate of return on the stock market ( K*1.

The tendency of a stock to move with the marketis reflected in its beta coefticient, which is a mea-sure of the stock's volatility relative to an averagestock. Betas are discussed at an intuitive level in thissection.

An average risk sfock is defined as one whichtends to move up and down in step with the generalmarket as measured by some index such as the DowJones or the New York Stock Exchange lndex. Such a

stock will, by definition, have a beta (g) of 1.0, whichindicates that, in general, if the market moves up by10 percent, the stock will also move up by 10 percent,while if the market falls by 10 percent, the stock willlikewise fall by 10 percent, A portfolio of such g = 1.0stocks will move up and down with the broad marketaverages and will be just as risky as the averages. lfB = 0.5, the stock is only half as volatile as the mar-ket-it will rise and fall only half as much-and a port-folio of such stocks is half as risky as a portfolio ofF = 1.0 stocks. On the other hand, if p :2.A, the stockis twice as volatile as an average stock, so a portfolioof such stocks will be twice as risky as an averageportfolio.

Betas are calculated and published by MerrillLynch, Value Line, and numerous other organiza-

The Concept of Beta When Investirg in Stocks

Suppose a regression wasrun with the historic realizedrate of return on a particularstock (K ) as the dependentvariable and the historic re-

Apple Computer

Union Pacific

Georgia-Pacific

Mattel

General Electric

Bristol Myers

General Motors

McDonald's

Procter & Gamble

IBM

Anheuser-Busch

Pacific Gas & Electric

1.60

1.43

1.36

1 .15

1.09

1.00

0.94

0.93

0.80

0.70

0.58

4.47

lf a high-beta stock (one whose beta is greaterthan 1,0) is added to an average risk (F : 1.0) portfolio,then the beta and consequently the riskiness of theportfolio will increase. Conversely, if a low-beta stock(one whose beta is less than 1.0) is added to an aver-age risk portfolio, the portfolio's beta and risk will de-cline. Thus, because a sfock's beta measures ifscontribution to the riskiness of the portfofio, beta isthe appropriate measure of the stock's riskiness.

F-testA procedure used to

determine if there is more

variability in the scores of one

sample than in the scores of

another sample.

'l':\ULIi,22.5

Analvsis ol- \'ariance'l':rble fr;r llivariatcRc-gre ssion

Source of Variation Sum of SquaresMean Square(Variance)

An F-test or an analysis ofvariance applied to regression can be used to test relativemagnitude of the SSr and SSe with their appropriate degrees of freedom. Table 22.5indicates the technique for conducting the F-test.

Degrees ofFreedom

Explained by regression

Unexplained (error)

where k - hUrTlber of estimated parameters (variables)/? - r'lurT'lber of observations

k-1n- k

SSr = >(V,- V1,

SSe - >(Y,- Y)'SSrl k-1SSeln - k

Page 13: Método Regresión

564

TABLtr 22.6

Analvsis of YarianceSummarr''I-able forRegression of Sales onBuilding Pennits

analysis of variance

summarY table

A table that Presents the

results of a regression

calculation.

PART VI Data Analysis and Presentation

Source of Variation Sum of Squares Mean Square F-Valued.f .

Explained bY regression

Unexplained by regression (error)

Total

3398.49

483.91

3882.40

3398.49

37.22

91 .30

1!14

Fortheexampleonsalesforecasting,theanalysisofvariancesummarytable'comparingrelativemagnitudesofthemeanSquare,ispresentedinTable22.6,FromTable6intheAppendixwefindthattheF-valuegl.3,withldegreeoffreedominthe numerator and 13 degrees of freedom in the denominator, exceeds the probabil-

ity level of .01. The ,orfiri"nt of determinatio-n, rz,reflects the proportion of varia-

tiln explained by the regression line' To calculate r2:

- SSr SSer.=lS =, _F

In our example, 12 is calculated to be '875:

" 3398.49

"=ffii ='875

The coefficient of determination may be interpreted to mean that 87 percent of the

variation in sales *u, "^ftuir.d

by associating the variable with building permits'

SUMN,TARY

In many situations two variables are interrelated or associated. Many bivariate statis-

tical techniques can be used to measure association. Researchers select the appropri-

ate technique on the basis of each variable's scale of measurement'

Thecorrelationcoefficient(r),astatisticalmeasureofassociationbetweentwovariables, ranges from r = +1 .0 for a perfect positive correlation to r = -1'0 for a per-

fect negative correlation. No correlaiion is indicated for r = 0. Simple correlation is

themeasureoftherelationshipofonevariabletoanother.Thecorrelationcoefficientindicates the strength of the association of two variables and the direction of that

association.Itmustberememberedthatcorrelationdoesnotprovecausation,asvariables other than those being measured may be involved' The coefficient of deter-

mination (rz) measures th" uriount of the total variance in the dependent variable

that is accounted for by knowing the value of the independent variable' The results

of correlation computations -" oft"' presented in a correlation matrix'

Bivariate tin"* r"gr"rrion investigates a straight-line relationship between one de-

pendent variable anO one independeni variable. The regression can be done intuitivell

fv prouing a scatter aiagram of the X and y points and drawing a line to fit the ob-

served relationship. rneieast-squares method mathematically determines the best-fit-

ting regression line for tlre observed data. The line determined by this method may be

used to forecast values of the dependent variable, given a value for the independent

Page 14: Método Regresión

CHAPTER 22 Bivariate Analtsis: Measures of Association

8. A football team's season ticket sales, percentage of games won, and number ofactive alumni are given below:

;i1,:

iEL.:fldr_,ar

.$ml ,

ffii.dlH

{ffij..=:'

ffi'fp'{&{&tqif,.

4'i,,

sffi1,rffii#

r;ie,,*,

SeasonYear Ticket Sales

Percentage ofGames Won

Number ofActive Alumni

1 985

1 986

1 987

1 988

1 989

1 990

1 991

1992

1 993

4,995

8,599

8,479

8,419

10,253

12,457

13,285

14,177

15,730

40

54

55

58

63

75

36

27

63

NA

NA

NA

NA

NA

6,315

6,860

8,423

9,000

a. Interpret the correlation between each variable.b. Calculate: Regression sales = Percentage of games won.c. Calculate: Regression sales = Number of active alumni.

9. Are the different forms of consumer installment credit in the table below highlycorrelated? Explain.

Credit Card Debt Outstanding (Millions of Dollars)

Year

Travel and BankGas Entertainment CreditCards Cards Gards

RetailCards

Total TotalCredit lnstallmentCards Credit

1

2

3

4

5

6

7

8

9

10

11

$ e3e

1,119

1,298

1,650

1,804

1,762

1,832

1,823

1,993

1,981

2,074

$6176

110

122

132

164

191

238

273

238

284

$ 828

1,312

2,639

3,792

4,490

5,408

6,838

9,281

9,501

1 1,351

14,262

$ 9,400

10,200

10,900

11,500

13,925

14,763

16,395

17,933

18,002

19,052

21 ,082

$1 1,229

12,707

14,947

17,064

20,351

22,097

25,256

28,275

29,669

32,622

37,702

$ 79,428

87,745

98,1 05

102,064

1 1 1,295

127,332

147,437

156,124

164,955

185,489

216,572

10. A manufacturer of disposable washcloths/wipes told a retailer that sales for this

product category closely correlated with the sales of disposable diapers. The re-

tailer thought he would check this out for his own sales-forecasting purposes.

Where might a researcher find data to make this forecast?

The Springfield Electric Company manufactures electric pencil sharpeners. The

company believes that sales are correlated with the number of workers em-

ployed in specific geographic al areas. The following table presents Springfleld's

11.