review of correlation & regression · 2017. 2. 27. · types of dependence ... correlation...

34
Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Review of Correlation & Regression Petra Petrovics

Upload: others

Post on 20-Aug-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Review of Correlation &

Regression

Petra Petrovics

Page 2: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Types of dependence

• association – between two nominal data

• mixed – between a nominal and a ratio data

• correlation – among ratio data

Page 3: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

• X (or X1, X2, … , Xp):

known variable(s) / independent variable(s) / predictor(s)

• Y: unknown variable / dependent variable

• causal relationship: X „causes” Y to change

Correlation Regression

describes the strength of a

relationship, the degree to

which one variable is linearly

related to another

shows us how to determine

the nature of a relationship

between two or more

variables

Page 4: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Correlation Measures

1. Covariance

2. Coefficient of correlation

3. Coefficient of determination

4. Coefficient of rank correlation

Page 5: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Correlation Measures

1. Covariance

The covariance between two variables is a measure of the joint variation of the two variables

– ranges from - to +;

– Cov = 0, when X and Y are uncorrelated;

– its sign shows the direction of correlation

– it doesn’t measure the degree of relationship!!!

1n

yyxx yx,Cov

Page 6: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

2. Coefficient of correlation (Pearson)

• its sign shows the direction of correlation

• it measures the strength of correlation

• 0 < r < 1 statistical dependence

r = 0 X and Y are uncorrelated

r = -1 negative ☻

r = 1 positive ☺

• You can use only in case of linear relationship!

yx ss

y,xCov r

Page 7: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

3. Coefficient of determination

• r2

• The square of the sample correlation coefficient between

the outcomes and their predicted values.

• Measures the degree of correlation in percentage (%)

• It provides a measure of how well future outcomes are

likely to be predicted by the model.

• Vary from 0 to 1.

y

e

y

y2

S

S - 1 =

S

S r

ˆ

Page 8: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Example

• A firm administers a test tosales trainees before they gointo the field. Themanagement of the firm isinterested in determining therelationship between the testscores and the sales made bythe trainees at the end of oneyear in the field. Thefollowing data were collectedfor 45 sales personnel whohave been in the field oneyear.

• Calculate differentcorrelation measures!

Page 9: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Sales-

person

Test

score

Number of

units sold

K. A. 25 188 +9 +22 +198

L. Z. 16 157 0 -9 0

B. E. 30 165 +14 -1 -14

G. P. 5 124 -11 -42 +462

… … … … … …

… … … … … …

S. G. 10 158 -6 -8 +48

J. T. 24 224 +8 +58 +464

V. P. 17 169 +1 +3 +3

T. L. 6 114 -10 -52 +520

Total 716 7 464 0 0 ∑dxdy=8 894.5

X Y

independent dependent variable

xi dxx yi dyy yxii ddyyxx

Page 10: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Number of observed pairs: n = 45

Positive correlation

8.26 s 16 x x

30.99 s 166 y y

202.15 1-45

894.5 8

1n

dd C

yx

Page 11: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

There is a strong & positive relationbetween test scores and number of unitssold.

The variation of test scores explains 62.36percent of the variation of number of unitssold.

% 62.36 r

0.7897 30.99 8.26

202.15

ss

C

2

yx

r

Page 12: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

4. Coefficient of rank correlation

(Spearman)

• Measure of the relationship between two ordinal data

• n = number of paired observations,

d = difference between the ranks for each pair of

observations.

• perfect correlation rs = 1

perfect inverse correlation rs = -1

in case of independence rs = 0

)1 (nn

d6 -1 r

2

2i

s

1 r 0 s

Page 13: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Student

Ability

A B C D E F G H I J Total

Mathematics 1 2 3 4 5 6 7 8 9 10 -

Music 3 4 1 2 5 7 10 6 8 9 -

di = xi - yi -2 -2 2 2 0 -1 -3 2 1 1 0

di2 4 4 4 4 0 1 9 4 1 1 32

Example

Ten students were ranked by their

mathematical and musical ability:

0.806 1) - (1010

326 - 1

)1 (nn

d6 - 1 ρ

22

2

i

strong relationship

Page 14: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Simple Linear Regression Model

• We model the relationship between two variables, X and Y

as a straight line.

• The model contains two parameters:

an intercept parameter,

a slope parameter.

Y = β0 + β1x + ε

Y = deterministic component + random error

where: Y – dependent or response variable (the variable we

wish to explain or predict)

x – independent or predictor variable

ε – random error component

β0 – y-intercept of the line, i.e. point at which the line intercept the y-axis

β1 – slope of the line

E (y)

x

β0 = y-intercept

β1 = slope

Page 15: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

y

x

Random error

Deterministic component• y = deterministic component +

random error

• We always assume that the mean value of the random error equals 0 the mean value of y equals the deterministic component.

• It is possible to find many lines for which the sum of the errors is equal to 0, but there is one (and only one) line for which the SSE (sum of squares of the errors) is a minimum:

least squares line / regression line.

ŷi = b0 + b1x i

Page 16: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

• The method of least squares gives us the bestlinear unbiased estimators (BLUE) of the regressionparameters, β0, β1.

• The least-squares estimators:

b0 estimates β0

b1 estimates β1

• The (empirical) regression line:

y caret („hat”):

• Calculation of the estimators:

min!,

2

1

1010

n

i

ii xbbybbf

xbby 10ˆ

Page 17: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Least Square Methode• There is an extreme value (minimum) if

tha partial derivation is equal to 0

• After transformation…

• The normal equations (with 1 x)

Σy = nb0 + b1ΣxΣxy = b0Σx + b1Σx

2

• The estimated regression line:

02

02

10

1

10

0

iii

ii

xbbyxb

f

xbbyb

f

ŷ = b0 + b1x

Page 18: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Interpretation

• b0: when x=0, y=b0

If the X variable is 0, how much is the Y.

• b1: for every 1 unit increase in x we expect

y to change by b1 units on average.

• If the X is higher with 1, what is the

difference in Y on average.

Page 19: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

No relationship

0

1000

2000

3000

4000

0 10 20 30 40Number of storks

Number of

births

Page 20: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Independence

- 2 - 1 0 1 2

- 3

- 2

- 1

0

1

2

3

N i n c s k o r r e lá c i ó

Y = - 7 . 4 E - 0 2 + 0 . 2 0 8 3 4 8 X

R - S q = 3 . 4 %

Page 21: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Positive correlation

3210- 1- 2- 3

3

2

1

0

- 1

- 2

- 3

P o z i t ív k o r r e lá c i ó

R -S q = 6 2 .5 %

Y = -8 . 6 E -0 2 + 0 . 6 9 0 2 8 6 X

Page 22: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Negative correlation

- 3 - 2 - 1 0 1 2 3

- 3

- 2

- 1

0

1

2

3

N e g a t ív k o r r e lá c i ó

Y = 5 . 0 7 E - 0 2 - 0 . 6 4 7 8 7 2 X

R - S q = 7 0 . 9 %

Page 23: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Curvilinear relation

- 3 - 2 - 1 0 1 2 3

0

1 0

2 0

3 0

4 0

N e m l i n e á r i s k o r r e lá c i ó

Y = 1 2 . 0 9 5 8 + 6 . 0 7 6 8 4 X + 1 . 1 6 6 8 6 X * * 2

R - S q = 8 8 . 4 %

Page 24: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Scatter diagrams

direct relationship

positive slope

0

10

20

30

40

50

0 10 20 30 40

Production (number of products per day)

w

a

s

t

a

g

e

0

400

800

1200

1600

0 10 20 30 40

Advertising in $

S

a

l

e

s

i

n

$ 0

1000

2000

3000

4000

5000

0 2 4 6 8 10 12Age of a house (year)

S

e

l

l

i

n

g

p

r

i

c

e

0

1000

2000

3000

4000

0 5 10 15

Age of a car (year)

S

e

l

l

i

n

g

p

r

i

c

e

linear

curvilinear

inverse relationship

negative slope

Page 25: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Power regression

Y = a Xb

logY = loga + b logX

↓ ↓ ↓

V = b0 + b1 ∙ x

b1 = b

b0 = lga

xbxbyx

xbnby

2

10

10

lglglglg

lglg

Page 26: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Compound regression

Y = a bx

logY = loga + logb x

↓ ↓ ↓

V = b0 + b1 ∙ x

b1 = lgb

b0 = lga

xbxbyx

xbnby

10

10

lg

lg

Page 27: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Estimation in Regression

• Regression estimation is a technique used to replace

missing values in data.

• If we know:

1. The estimated parameter value;

2. The hypothesized value of the parameter;

3. Confidence interval around the estimated parameter.

• The number of degrees of freedom equals the number of

observations minus the number of parameters estimated.

• = n-2

Page 28: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Parameter Estimated value Standard error

0 b0

1 b1

0

Y0

Estimation in Regression

2i

2i

)x(xn

x

es

2i )xx (

es

0y

2i

20

)xx

)xx

n

(

(1es

0y

2i

20

)x(x

)xx +

n

1

(1es

y

y

b

b

sty

sty

stb

stb

ˆ

ˆ

1

0

ˆ

ˆ

1

0

= n-2

In case of average Y values

In case of discrete Y values

Page 29: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Elasticity

xbb

x b x)E(y,

10

1

E(y, x) = bx

y1

Elasticity at the mean

% change in x demanded % change in y

Page 30: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Residual variable

n

i

ii

n

i

n

i

ii

iiii

iii

iii

yyyyyy

eyyyy

eyy

yye

1

2

1 1

22ˆˆ

ˆ

ˆ

ˆ

Sy = + Se

Sum of square of

Y

Sum of square

explained by

regression

Sum of square of the

errors

yS ˆ

Page 31: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Sum of

SquaresDf

Mean Sum

of SquaresF

Regression 1

Residual n-2

Total n-1

Analysis of Variance in

Regression Analysis

2e

2y

2y SS S ˆ

2

i

n

1=i

2n

1=i

i

n

1=i

2

i )y(y + )yy( )y(y

2

iy )yy( = S yS

2

ie )y(y = S )2/( nS s e2e

S = (y y)y i

2 1-n

Sy

2)-/(nS

S =F

e

y

Page 32: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Model testing

H0: β1 = 0

H1: β1 ≠ 0 (linear model)

Test statistic:

• F-statistic tests whether all the slope coefficients

in a linear regression are equal to 0.

• Measures how well the regression equation

explains the variation in the dependent variable.

2)-/(nS

S

s

S =F

e

y

2

e

y0

Pr

211 : H

F

);(

1

121 F

0

Pr

211 : H

);( 21

21

F

);(

1

12

21

F

F

0

Pr

211 : H

F);( 211 F

H0

Page 33: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Parameter testing

H0: β1 = 0

H1: β1 ≠ 0

Test statistic:

where: b1 is the least square estimate of the

regression slope

s(b1) is the standard error of b1

)( 1

1

bs

bt

1t 0

Pr01 : mH

2/1 t 0

Pr

2/1 t

01 : mH

0

Pr01 : mH

1t

H0

Page 34: Review of Correlation & Regression · 2017. 2. 27. · Types of dependence ... Correlation Regression describes the strength of a relationship, the degree to which one variable is

Miskolci Egyetem Gazdaságtudományi Kar

Üzleti Információgazdálkodási és Módszertani Intézet

Thanks for your attention!