gy2100 geographical data analysis lecture 4 regression analysis and statistical inference department...
Post on 30-Mar-2015
222 Views
Preview:
TRANSCRIPT
0
0.5
1
1.5
2
2.5
-1 1 3 5
Mean grain size, phi
Sort
ing
Braided stream
Point bar
Stream mouth bar
Tidal flat
Inter distrib. beach
Mature beach
Dune
GY2100 Geographical Data Analysis
Lecture 4Regression analysis and
statistical inference
DEPARTMENT OF GEOGRAPHY
The statistical utility of a regression line
The regression model and its underlying assumptions
Yi i = + Xi i + ii
Systematic or deterministic component represented by a straight line
Random or stochastic component represented by the deviations of the observations about the line
– alpha
– beta
delta
Illustrating regression using the
fixed X model
Assumptions of the regression model
1. The relationship between X and Y is linear;
2. Values of X are fixed and measured without error;
3. The disturbance terms i are normally distributed with equal variance about the line Y = + X and each has an expected value E(i) = 0. This means that the expected value for a given value of X is E(Yi,Xi) = + Xi;
Assumptions of the regression model
4. The di are statistically uncorrelated:
a. There is no autocorrelation (di term is uncorrelated with X)
b. There is no spatial autocorrelation (di are not correlated with another variable).
Testing the assumptions
1. Specific statistical tests using residuals of the sample regression line as estimates of the error term in the true population regression model
2. Histogram of residuals3. Examination of residual plots
Examination of residual plots
Recap
Yi = a + bXi + di
Inferences in regression analysis
Slope parameter () Intercept parameter () Precision of estimates derived from
the sample regression equation
Testing if =0
Elevation above sea-level vs mean annual rainfallfor Scotland
Y' = 2.38X + 895
0
500
1000
1500
2000
2500
0 100 200 300 400 500 600
Elevation (m)
Rai
nfal
l (m
m y
r-1)
If = 0 then
Y’= constant for all X
Testing =0 using the t test
HypothesesHo: = 0
H1: 0
Under Ho, repeated sampling yields a distribution of b which follows a t distribution about an expected value of = 0.
Testing =0 using the t test
Since we are testing
= 0, then
bsb
nt
2
n
i
n
iii
yx
nXX
sbs
1
2
1
2 1
where
df = n-2 is the number of
degrees of freedom
sb = estimated standard error
of the sampling distribution of b
bn s
bt
2
The test statistic to be
calculated is
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
Testing =0 using the t test
Since we are testing
= 0, then
bsb
nt
2
bn s
bt
2
The test statistic to be
calculated is
So……
36.5444.038.2 t
With df=n-2=18,
tcrit = 2.1 at = 0.05.
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5 581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
Testing =Q using the t test(Q0)
bn s
bt
2
The test statistic to be
calculated is
If we are testing = 4, then
444.0438.2
2
bn s
bt
Testing =0 using the F test
Decomposition of the variance using sums of squares
Total sum = Regression sum + Residual sum
of squares of squares of squares
n
iii
n
i
n
iii YYYYYYTSS
1
2
1 1
22)'()'()(
Testing =0 using the F test
Decomposition of the variance using sums of squaresTotal sum of squares (TSS) is the sum of the squared deviations of the individual observations about their mean
n
ii YYTSS
1
2)(
Regression sum of squares (RSS) is the sum of the squared deviations of the predicted Y values (Y’) about the mean Y
n
ii YYRSS
1
2)'(
The difference between TSS and RSS is ‘unexplained’ by the regression line and
is therefore the residual sum of squares (Residual SS)
n
iii YYSSsid
1
2)'(Re
Testing =0 using the F test
Decomposition of
the variance using
sums of squares
TSS=RSS+Resid SS
n
ii YYTSS
1
2)(
n
ii YYRSS
1
2)'(
n
iii YYSSsidual
1
2)'(Re
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
Regression in Excel
ANOVA
df SS MS F Sig F
Regression
1 RSS RSS/df RMSS/Resid MSS
Residual n-2 Resid SS Resid SS/df
Total n-1 TSS TSS/df
RMSS = Regression MSS
Resid MSS – Residual MSS
Regression in Excel
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558 28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
With 1 and 18 df,
Fcrit 2.7 at = 0.05.
Constructing a confidence interval for
In general bb stbstb ..
For the rainfall data
31.344.1
444.0*1.238.2444.0*1.238.2
Regression in EXCELRegression Statistics
Multiple R 0.78
R square 0.61
Adj R square 0.59
Standard error 242.79
Observations 20
ANOVA
df SS MS F Sig F
Regression 1 1691558 1691558
28.7 4.31E-5
Residual 18 1061062 58948
Total 19 2752620
Coeffs Standard Error t stat P-value Lower 95% Upper 95%
Intercept 895 149.76 5.98 1.178E-5
581 1210
Elevation, m 2.38 0.444 5.36 4.31E-5 1.44 3.31
top related