gy2100 geographical data analysis lecture 4 regression analysis and statistical inference department...

26
0 0.5 1 1.5 2 2.5 -1 1 3 5 M ean grain size, phi Sorting Braided stream Pointbar Stream m outh bar Tidal flat Interdistrib. beach M ature beach D une

Upload: madeleine-melling

Post on 30-Mar-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

0

0.5

1

1.5

2

2.5

-1 1 3 5

Mean grain size, phi

Sort

ing

Braided stream

Point bar

Stream mouth bar

Tidal flat

Inter distrib. beach

Mature beach

Dune

Page 2: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

GY2100 Geographical Data Analysis

Lecture 4Regression analysis and

statistical inference

DEPARTMENT OF GEOGRAPHY

Page 3: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

The statistical utility of a regression line

Page 4: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

The regression model and its underlying assumptions

Yi i = + Xi i + ii

Systematic or deterministic component represented by a straight line

Random or stochastic component represented by the deviations of the observations about the line

– alpha

– beta

delta

Page 5: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Illustrating regression using the

fixed X model

Page 6: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Assumptions of the regression model

1. The relationship between X and Y is linear;

2. Values of X are fixed and measured without error;

3. The disturbance terms i are normally distributed with equal variance about the line Y = + X and each has an expected value E(i) = 0. This means that the expected value for a given value of X is E(Yi,Xi) = + Xi;

Page 7: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Assumptions of the regression model

4. The di are statistically uncorrelated:

a. There is no autocorrelation (di term is uncorrelated with X)

b. There is no spatial autocorrelation (di are not correlated with another variable).

Page 8: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing the assumptions

1. Specific statistical tests using residuals of the sample regression line as estimates of the error term in the true population regression model

2. Histogram of residuals3. Examination of residual plots

Page 9: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Examination of residual plots

Page 10: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Recap

Yi = a + bXi + di

Page 11: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Inferences in regression analysis

Slope parameter () Intercept parameter () Precision of estimates derived from

the sample regression equation

Page 12: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing if =0

Elevation above sea-level vs mean annual rainfallfor Scotland

Y' = 2.38X + 895

0

500

1000

1500

2000

2500

0 100 200 300 400 500 600

Elevation (m)

Rai

nfal

l (m

m y

r-1)

If = 0 then

Y’= constant for all X

Page 13: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the t test

HypothesesHo: = 0

H1: 0

Under Ho, repeated sampling yields a distribution of b which follows a t distribution about an expected value of = 0.

Page 14: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the t test

Since we are testing

= 0, then

bsb

nt

2

n

i

n

iii

yx

nXX

sbs

1

2

1

2 1

where

df = n-2 is the number of

degrees of freedom

sb = estimated standard error

of the sampling distribution of b

bn s

bt

2

The test statistic to be

calculated is

Page 15: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in EXCELRegression Statistics

Multiple R 0.78

R square 0.61

Adj R square 0.59

Standard error 242.79

Observations 20

ANOVA

df SS MS F Sig F

Regression 1 1691558 1691558

28.7 4.31E-5

Residual 18 1061062 58948

Total 19 2752620

Coeffs Standard Error t stat P-value Lower 95% Upper 95%

Intercept 895 149.76 5.98 1.178E-5

581 1210

Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31

Page 16: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the t test

Since we are testing

= 0, then

bsb

nt

2

bn s

bt

2

The test statistic to be

calculated is

So……

36.5444.038.2 t

With df=n-2=18,

tcrit = 2.1 at = 0.05.

Page 17: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in EXCELRegression Statistics

Multiple R 0.78

R square 0.61

Adj R square 0.59

Standard error 242.79

Observations 20

ANOVA

df SS MS F Sig F

Regression 1 1691558 1691558

28.7 4.31E-5

Residual 18 1061062 58948

Total 19 2752620

Coeffs Standard Error t stat P-value Lower 95% Upper 95%

Intercept 895 149.76 5.98 1.178E-5 581 1210

Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31

Page 18: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =Q using the t test(Q0)

bn s

bt

2

The test statistic to be

calculated is

If we are testing = 4, then

444.0438.2

2

bn s

bt

Page 19: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the F test

Decomposition of the variance using sums of squares

Total sum = Regression sum + Residual sum

of squares of squares of squares

n

iii

n

i

n

iii YYYYYYTSS

1

2

1 1

22)'()'()(

Page 20: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the F test

Decomposition of the variance using sums of squaresTotal sum of squares (TSS) is the sum of the squared deviations of the individual observations about their mean

n

ii YYTSS

1

2)(

Regression sum of squares (RSS) is the sum of the squared deviations of the predicted Y values (Y’) about the mean Y

n

ii YYRSS

1

2)'(

The difference between TSS and RSS is ‘unexplained’ by the regression line and

is therefore the residual sum of squares (Residual SS)

n

iii YYSSsid

1

2)'(Re

Page 21: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Testing =0 using the F test

Decomposition of

the variance using

sums of squares

TSS=RSS+Resid SS

n

ii YYTSS

1

2)(

n

ii YYRSS

1

2)'(

n

iii YYSSsidual

1

2)'(Re

Page 22: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in EXCELRegression Statistics

Multiple R 0.78

R square 0.61

Adj R square 0.59

Standard error 242.79

Observations 20

ANOVA

df SS MS F Sig F

Regression 1 1691558 1691558

28.7 4.31E-5

Residual 18 1061062 58948

Total 19 2752620

Coeffs Standard Error t stat P-value Lower 95% Upper 95%

Intercept 895 149.76 5.98 1.178E-5

581 1210

Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31

Page 23: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in Excel

ANOVA

df SS MS F Sig F

Regression

1 RSS RSS/df RMSS/Resid MSS

Residual n-2 Resid SS Resid SS/df

Total n-1 TSS TSS/df

RMSS = Regression MSS

Resid MSS – Residual MSS

Page 24: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in Excel

ANOVA

df SS MS F Sig F

Regression 1 1691558 1691558 28.7 4.31E-5

Residual 18 1061062 58948

Total 19 2752620

With 1 and 18 df,

Fcrit 2.7 at = 0.05.

Page 25: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Constructing a confidence interval for

In general bb stbstb ..

For the rainfall data

31.344.1

444.0*1.238.2444.0*1.238.2

Page 26: GY2100 Geographical Data Analysis Lecture 4 Regression analysis and statistical inference DEPARTMENT OF GEOGRAPHY

Regression in EXCELRegression Statistics

Multiple R 0.78

R square 0.61

Adj R square 0.59

Standard error 242.79

Observations 20

ANOVA

df SS MS F Sig F

Regression 1 1691558 1691558

28.7 4.31E-5

Residual 18 1061062 58948

Total 19 2752620

Coeffs Standard Error t stat P-value Lower 95% Upper 95%

Intercept 895 149.76 5.98 1.178E-5

581 1210

Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31