math 2016 (13177) statistical modelling

62
Statistical Modelling Chapter I 1 MATH 2016 (13177) Statistical Modelling Course is about designing experiments and using linear models to analyze data, both from experiments and surveys. Course coordinator: Chris Brien

Upload: britanney-petty

Post on 31-Dec-2015

35 views

Category:

Documents


2 download

DESCRIPTION

MATH 2016 (13177) Statistical Modelling. Course coordinator: Chris Brien. Course is about designing experiments and using linear models to analyze data, both from experiments and surveys. I.Statistical inference. I.AE xpected values and variances I.B The linear regression model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 1

MATH 2016 (13177) Statistical Modelling

Course is about designing experiments and using linear models to analyze data, both from experiments and surveys.

Course coordinator: Chris Brien

Page 2: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 4

I. Statistical inference

I.A Expected values and variances

I.B The linear regression model

I.C Model selection

a) Obtaining parameter estimates

b) Regression analysis of variance

I.D Summary

Page 3: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 5

I.A Expected values and variances • Statistical inference is about drawing conclusions

about one or more populations based on samples from the populations.

• Compute statistics or estimates from samples.• They are used as estimates of particular population

quantities, these being called parameters. • Important to be clear about distinction —when one

is talking about a mean, is it the population or sample mean?

• To aid in making the distinction, convention is to use Greek letters as symbols for parameters and ordinary Roman letters as symbols for statistics.

• Fundamental in this course are population expected value and variance.

Page 4: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 6

Expected value• Expected value mean of the variable Y in a

population — it is a population parameter.

Definition I.1: The expected value of a continuous random variable Y whose population distribution is described by f(y) is given by

Y E Y yf y dy

• That is, Y E[Y] is the mean in a population whose distribution is described by f(y).

Page 5: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 7

Properties of expected valuesTheorem I.1: Let Y be a continuous random

variable with probability distribution function f(y). The expected value of a function u(y) of the random variable is

E u Y u y f y dy

Proof: not given• Note that any function of a random variable is

itself a random variable. • Use above theorem in next• Theorem I.2: E a v Y b aE v Y b

Page 6: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 8

Proof of Theorem I.2• For a continuous random variable, we have from

theorem I.1

E a v Y b a v y b f y dy

a v y f y dy bf y dy

a v y f y dy b f y dy

aE v Y b

• In particular, E aY b aE Y b

Page 7: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 9

VarianceDefinition I.2: The variance of any random

variable Y is defined to be

222var Y YY E Y E Y E Y

• That is the variance is the mean in the population of the squares of the deviations of the observed values from the population mean.

• It measures how far on average observations are from the mean in the population.

• It is also a population parameter.

Page 8: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 10

Variance (cont’d)

Theorem I.3: The variance of a continuous random variable Y whose population distribution is described by f(y) is given by

22var Y YY y f y dy

Proof: This is a straight forward application of theorem I.1 where

2Yu Y Y

Page 9: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 11

Normal distribution parameters and estimators• Common in this course• The distribution function for such a variable

involves the parameters Y and Y as follows:

222

1f exp

2(2 )

Y

YY

yy

• So we want to estimate Y and we have a sample y1, y2,…, yn.

• Note the lower case y for observed values as opposed to Y for the random variable.

• The obvious estimator of (drop subscript) is the sample mean

1n

iiY Y n

Page 10: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 12

Estimators• Note we call the formula that tells us how to

estimate a parameter an estimator and it is a function of random variables, Ys.

• The value obtained by substituting the sample values into the formula is called the estimate and it is a function of observed values, ys.

• It is common practice to denote the estimator as the parameter with a caret over it. – means that the estimator of is ˆ Y Y

– also stands for the estimate so that means that the estimate of is ̂ ˆ y

y

Page 11: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 13

I.B The linear regression model

0 1

0 1 1

20 1 1 2 1

0 1 1 2 2 12 1 2

0 1 1ln

1 1 e

x

x

Y x x

Y x x x x

Y x

Y e

Y

All but the last two are linear in the is.

0 1 1 2 2 p pY x x x • where

• Y is a continuous random variable and

• xis are quantitative variables that are called the explanatory variables.

• This model is a linear model in the is.

• We consider models of the general form:

Page 12: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 14

The model for n observations• Would conduct a study in which n ( p+1) observations are taken of

Y and the xis.• Leads to the following system of equations that model the observed

responses.

1 0 1 11 2 12 1 1

2 0 1 21 2 22 2 2

0 1 1 2 2

0 1 1 2 2

p p

p p

i i i p ip i

n n n p np n

Y x x x

Y x x x

Y x x x

Y x x x

• What does the model tell us about our data? Have a response variable, Y, whose values are related to several

explanatory variables, xis. (lower case x as not random variables) is, random errors, account for differences in values of response

variable for same combination of values of the explanatory variables

Page 13: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 15

Usual extra assumptions about is

• These mean:– on average the errors cancel out so that we get the

population value of the response,– the variability of the errors is independent of the

values of any of the variables – the error in one observation is unrelated to that of any

other observation.

• The last assumption involves a third quantity involving expectations: covariance.

20, var and cov , 0,i i i jE i j

Page 14: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 16

CovarianceDefinition I.3: The covariance of two random

variables, X and Y, is defined to be

cov ,X Y E X E X Y E Y • The covariance measures the extent to which the

two random variables values move together.• In fact, the linear correlation coefficient can be

calculated from it as follows:

cov ,corr ,

var var

X YX Y

X Y

• That is, the correlation coefficient is just the covariance adjusted or standardized for the variance of X and Y.

Page 15: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 17

Matrix notation for the system of equations • Matrices in bolded upper case letters and • Vectors in bolded lower case except vectors of random

variables will be in upper case. • Thus in matrix terms let

11 12 11 121 22 2 02 2

1

1 12

1 2

11

, , ,1

1

p

p

i i ip ip

n nn n np

x x xYx x xY

Y x x x

Y x x x

Y X θ ε

• The system of equations can be written Y Xθ ε

2with and var

where is the identity matrix.

n

n

E

n n

0 V IεI

ε ε

Page 16: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 18

Expectation and variance of a random vector Definition I.4: Let Y be a vector of n jointly-

distributed random variables with

2, var and cov , .i i i i i j ij jiE Y Y Y Y

• Then, the random vector is 1

2

n

YY

Y

Y

• The expectation vector, , giving the expectation of Y is

1 1

2 2

nn

E YE YE

E Y

Y ψ

Page 17: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 19

Expectation and variance of a random vector (cont’d)• The variance matrix, V, giving the variance of Y

is

21 12 1 1

212 2 2 2

21 2

21 2

i n

i n

i i i in

n n in n

E E E

V Y Y Y Y

• Note transpose in last expression

Page 18: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 22

Lemma I.1: The transpose of a matrix (selected properties)2. The transpose of a column vector is a row vector and

vice versa so that we always write the column vector as untransposed and the row vector as transposed — a is a column vector and a is the corresponding row vector.

4. The transpose of a product is the product of the transposes, but with the order of the matrices reversed — AB B A

9. A column vector premultiplied by its transpose is the sum of squares of its elements, also a scalar —

2

1.

nii

aa a

10. A column vector of order n post multiplied by its transpose is a symmetric matrix of order n n — from property 7 we have .aa aa

• In particular, property 10 applies to V in definition I.4 and tells us that V is an n n symmetric matrix.

Page 19: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 23

Model for expectation & variance• Have model for Y with conditions on .• Find expressions for elements of E[Y] and var[Y]. • Thus,

0 1 1 2 2

0 1 1 2 2

0 1 1 2 2

i i i p ip i

i i p ip i

i i p ip

E Y E x x x

x x x E

x x x

2

20 1 1 2 2

0 1 1 2 2

2

22 2

var

since var

i i i

i i p ip i

i i p ip

i

i i i i

Y E Y E Y

x x xE

x x x

E

E E E

Page 20: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 24

Model in terms of expectation and variance (cont’d)

cov ,

cov ,

0

i j i i j j

i j

i j

Y Y E Y E Y Y E Y

E

• In matrix terms, the alternative expression for the model is:

2and var nE YY Xθ Y V I

• That is, V is also the variance matrix for Y.

Page 21: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 25

Example I.1 House price• Suppose it is thought that the price obtained for a

house depends primarily the age and livable area.

• Observe 5 randomly selected houses on the market:

Price $’000 y

Age years 1x

Area

’000 feet2 2x

50 1 1 40 5 1 52 5 2 47 10 2 65 20 3

• In this example, n = 5 and p = 2.

Page 22: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 26

Model proposed for data

• or, equivalently,

0 1 1 2 2i i i iY x x

2with 0, var and cov , 0,i i i jE i j

0 1 1 2 2i i iE Y x x

2with var and cov , 0,i i jY Y Y i j

• In matrix terms, the model is:

2 with and var ,nEY Xθ ε 0 V Iε ε

2 and var nE Y X Y V Iθ• or, equivalently,

Page 23: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 27

Model matrices for example

• We also have the vector, y, of observed values of Y:

2 with and var ,nEY Xθ ε 0 V Iε ε

1 1

02 2

3 31

24 4

5 5

1 1 11 5 1

, , ,1 5 21 10 21 20 3

YYYYY

Y X εθ

2

2

2

2

2

0 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

V

5040524765

y

Page 24: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 28

Example I.2 Voter turnout • In this example a political scientist attempted to

investigate the relationship between campaign expenditures on televised advertisements and subsequent voter turnout.

• Aim to predict voter turnout from advertising expenditure.

Voter % Advert Voter % Advert Turnout Expenditure Turnout Expenditure

35.4 28.5 40.8 31.3 58.2 48.3 61.9 50.1 46.1 40.2 36.5 31.3 45.5 34.8 32.7 24.8 64.8 50.1 53.8 42.2 52.0 44.0 24.6 23.0 37.9 27.2 31.2 30.1 48.2 37.8 42.6 36.5 41.8 27.2 49.6 40.2 54.0 46.1 56.6 46.1

Page 25: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 29

Proposed model• Simple linear regression as only 1 explanatory

variable.• Drop subscript for the independent variable:

0 1i iE Y x

2with var and cov , 0,i i jY Y Y i j

• How should data behave for this model? • E[Yi] specifies

population mean.• var[Yi] specifies

variability around population mean.

• cov[Yi, Yj] specifies relationship

Page 26: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 30

Scatter diagram for Turnout versus Expend

• Does it look like the model will describe this situation?

Advertising expenditure (%)

Voter turnout

20

25

30

35

40

45

50

55

60

65

20 25 30 35 40 45 50 55

Page 27: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 31

I.C Model selection

• Generally, we want to determine the model that best describes the data.

• To do this we usually obtain estimates of our parameters under several alternative models and use these in deciding which model to use to describe the data.

• The choice of models is often made using an analysis of variance (ANOVA).

Page 28: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 32

a) Obtaining parameter estimates

• Estimators of the parameters in the expectation model are obtained using the least squares or maximum likelihood criteria — they are equivalent in the context of linear models.

• Also, an estimator of 2 is obtained from the ANOVA described in the next section.

• Here will establish the least squares estimators of

Page 29: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 33

Least squares estimators• Definition I.5: Let Y X + where

– X is an nq matrix with nq, – is a q1 vector of unknown parameters, – is an n1 vector of errors with mean 0 and variance 2In, q p + 1 and nq.

The least ordinary least squares (OLS) estimator of is the value of that minimizes

21

nii ε ε

• Note that– is of the form described in property 9 of lemma I.1– and is a scalar that is the sum of squares of the

elements of or the sum of squares of the "errors".

Page 30: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 34

Least squares estimators of • Theorem I.4: Let Y X + where

– Y is an n1 vector of random variables for the observations,

– X is an nq matrix of full rank with nq, – is a q1 vector of unknown parameters, – is an n1 vector of errors with mean 0 and variance 2In, q p + 1 and nq.

The ordinary least squares estimator of is given by

1ˆ θ X X X Y

• (The ‘^’ denotes estimator)• Proof: see notes

Page 31: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 35

Least squares estimates of

• For a particular example, we will have an observed vector y — substitute this into the estimator to yield the estimate for that example.

1θ̂ X X X y

• Note the dual use of to denote the estimator and the estimate.

θ̂

Page 32: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 36

What does full rank mean?• Definition I.6: The rank of an nq matrix A with

nq is the number of linearly independent columns of the matrix. The matrix is said to be of full rank, or rank q, if, none of the columns in the matrix can be written as a linear combination of the other columns.

• Example I.1 House price (continued) For this example the X matrix is

1 1 11 5 11 5 21 10 21 20 3

X

It is rank 3 and is full rank as no column can be written as a linear combination of the other two.

Page 33: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 37

Another example

• On the other hand the following two matrices are of rank 2 as the second columns are 5(3) and 5(3) – 9(1), respectively:

1 5 11 5 11 10 21 10 21 15 3

X

1 4 11 4 11 1 21 1 21 6 3

X

Page 34: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 38

Fitted values and residuals• Definition I.7: The estimator of the expected

values for the expectation model E[Y] X is given by ˆˆ ψ Xθ

The estimates of for a particular observed y are called the fitted values. They are computed by substituting the values of the estimates of and the explanatory variables into the fitted equation.

• Definition I.8: The estimator of the errors for the expectation model E[Y] X is given by

ˆˆ ˆ ε Y Xθ Y ψand so the estimates, the residuals, are computed by subtracting the fitted values from the observed values of the response variable.

Page 35: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 39

Recap thus far• Often want to decide between two models

– Fit models using least squares– Want to use ANOVA to select between alternatives

• For the model Y X + or E[Y] X and V 2I, the ordinary least squares estimator of is given by

M Mˆˆ where ? ψ Xθ Q Y Q

1ˆ θ X X X Y

• The estimator of the expected values is given by

• and of the errors is given by

R Rˆˆ ˆ where ? ε Y Xθ Y ψ Q Y Q

• Least squares can be viewed as the orthogonal projection of the data vector, in the n-dimensional data space, into both the model and residual subspaces using the Qs.

Page 36: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 40

Error estimator as a linear combination• Given the expression for estimator of the expected

values, the estimator of the errors are given by

M

M

R

ˆˆ

n

ε Y Xθ

Y Q Y

I Q Y

Q Y

• Hence the fitted values and residuals are given by

M

R

ˆ

ˆ ˆ

ψ Q y

ε y ψ

Q y

Page 37: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 41

Projection operators — QM• Seen that 1

M Mˆˆ where ψ Xθ Q Y Q X X X X

• QM is a nn projection matrix with the property that it is symmetric and idempotent.

• Definition I.9: A matrix E is idempotent if E2 E. • Given that X is an nq matrix,

– then QM = X(XX)-1X

– is the product of nq, qq and qn matrices – with the result that it is an nn matrix.

• Clearly the product of the nn matrix QM and the n1 vector Y is an n1 vector.

• So the estimator of the expected values is a linear combination of the elements of Y.

Page 38: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 42

Projection operators — QR

• Theorem I.5: Given that the matrix E is symmetric and idempotent, then R I E is also symmetric and idempotent.In addition, RE ER 0.

• Application of this theorem to the regression situation leads us to conclude that – QR is symmetric and idempotent – with QRQM QMQR 0.

• All of this can be viewed as the orthogonal projection of vectors onto subspaces.

Page 39: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 43

Geometry of least squares• The observation vector y is viewed as a vector

in n-space and this space is called the data space.

• Then the X matrix, with q linearly independent columns, determines a q-dimensional subspace of the data space — this space is called the model (sub)space

Page 40: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 44

Geometry of least squares (cont’d)

• Fitted values are orthogonal projection of observation vector into the model space. – The orthogonal projection is achieved using the

idempotent, or projection matrix, QM.

• Residuals are projection of observation vector into the residual subspace, the subspace of the data space orthogonal to the model space.– Matrix that projects onto the residual subspace is QR.

• That QRQM QMQR 0 reflects that the two subspaces are orthogonal.

Page 41: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 45

Projectors properties

2M M ?Q Q• obvious why

– Once you have projected y into the model subspace and obtained QMy, it is in the model subspace.

– Applying QM to the fitted values, that is to QMy, will have no effect because they are already in the model subspace;

– clearly, • A similar argument applies to QR.

• Also, it should be clear why QRQM 0.

2M M M M .Q y Q Q y Q y

Page 42: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 46

Example I.3 Single sample• Suppose that a single sample of 3 observations

has been obtained. • The linear model we propose for this data is that

2G 3

11 and var1

nE Y X 1 Y I

• or, for an individual observation,

2 with var and cov , 0,i i i i jY Y Y Y i j

• That is, the value for an observation is made up of – the population mean – plus a particular deviation from the population

mean for that observation.

Page 43: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 47

Projection matrix• In this case QM, a 3 3 matrix, is rather simple as

1M G G G G

13 3 3 3

13 3

13 33

13

133

3

1 1 11 1 11 1 1

Q X X X X

1 1 1 1

1 1

1 1

J

1M 3

1 1 1ˆ 1 1 1

1 1 1

YYY

ψ Q Y Y

• and

Page 44: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 48

Grand mean operator• That is, in this case, QM is the matrix that replaces

each observation with the grand mean of all the observations. Will call it QG.

• Throughout this course the vector of grand means will be denoted as .

3 3Hence, and .Y yG 1 g 1

• Note that– estimator of in our model is the mean of the

elements of Y, – estimate is the mean of the observations,

Y

y

G

Page 45: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 49

A simple 3-D example• Suppose that 2,1,2 .y• Then 5 3 1.67y • and fitting the model E[Y] 1n results in

– fitted values

– residuals ˆ 1.67,1.67,1.67 ψ

ˆ 0.33, 0.66,0.33 ε

1st data point plotted on axis coming out of figure,

2nd on axis going across

3rd on axis going up.

fitted vector in model subspace

residual vector

residual subspace orthogonal to model subspace

Page 46: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 50

I.C Model selection

• Generally, we want to determine the model that best describes the data. – obtain estimates of our parameters under

several alternative models. – Choose model using an analysis of variance

(ANOVA).

Page 47: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 52

b) Regression analysis of variance

• An ANOVA is used to compare potential models.

• In the case of the regression model, it is common to want to choose between two expectation models, one which is a subset of the other.

Page 48: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 53

Testing all expectation parameters are zero • The simplest, although not necessarily the most

useful, situation is where one compares the expectation models

0 1 1 2 2 and

0

i i i

i

E Y x x

E Y

• So we first state the null and alternative hypothesis for the hypothesis test.

H0: 0 (equivalent to E[Yi] 0)

H1: 0 (equivalent to E[Yi] 0 + 1x1i + 2x2i)

Page 49: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 54

Computing test statistic using ANOVA table

• Generally ANOVA comparing two models involves – SSqs of estimators of for the null model and – difference between SSqs of estimators of for the two models.

• In this case, – Estimators for null model are all 0 and so the difference in SSqs is

equal to the SSq of the estimators of of the alternative model. – Could leave Model0 out of the table altogether.

• Note use of s2, the symbol for variance, for MSqs – because MSqs are variances (ratio of a SSq to its df).

Source DF SSq MSq F p

Model0 0 0 0

Model1 Model0 (Model)

q ˆ ˆψ ψ 0 0 2M

ˆ ˆs

q

ψ ψ

2M2R

s

s ,Pr q n q OF F

Residual n q ˆ ˆε ε 2R

ˆ ˆs

n q

ε ε

Total n Y Y

Page 50: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 55

Computing test statistic using ANOVA table

• Two parallel identities: – Obviously Total df Model df + Residual df. – Not so clear Total SSq Model SSq + Residual SSq

(but remember geometry).

• SSq are of estimators of and , and of Y.• If the p-value is less than the significance level, ,

the H0 is rejected. Usually, 0.05.

Source DF SSq MSq F p

Model (Regression)

q ˆ ˆψψ 2M

ˆ ˆs

q

ψψ

2M2R

s

s ,Pr qn q OF F

Residual n q ˆ̂εε 2Rˆˆs

n q

εε

Total n YY

Page 51: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 58

Squared length of a vector SSq of its elements

• This is equivalent to the SSQ identity

ˆ ˆˆ ˆ y y ψ ψ ε ε

2 2 2ˆˆ y ψ ε• From Pythagoras’ theorem

Page 52: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 59

Example I.3 Single sample (continued)

• Because data is very close to fitted line, only a small vector in residual space with small squared length.

• But fitted values involve only 1 value and so has only 1 df whereas residuals has 2 independent values and so 2 df.

• Adjust by dividing each SSq by it df, to yield mean squares: results in 8.33 and 0.33. Bigger difference!

Easy to verify that the squared lengths, or SSq, are 9, 8.33 and 0.67 for total, fitted and residual, respectively.

Page 53: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 60

Example I.1 House price (continued) • For this example, using the computer we find that

33.06ˆ 0.189

10.178

θ

• The estimated expected value is given by

1 233.0626 0.1897 10.7182E Y x x • In this case QM, a 5 5 a projection matrix, is

somewhat more complicated. Using R:

M

0.448 0.359 0.249 0.138 0.1930.359 0.683 0.245 0.160 0.0420.249 0.245 0.805 0.188 0.0040.138 0.160 0.188 0.215 0.2980.193 0.042 0.004 0.298 0.849

Q

• Fitted values from– Equation, or

– Applying QM to y.

Page 54: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 61

Fitted values and residuals

• Note that – the Observations are equal to sums of Fitted values

and the Residuals– the sum of the last two SSqs is approximately equal to

the Total SSq.

Observations y

Fitted values Mˆ ψ Q y

Residuals

M

R

ˆ

ε y Q y

Q y

50 43.59116 6.408840 40 42.83241 -2.832413 52 53.55064 -1.550645 47 52.60221 -5.602210 65 61.42357 3.576427

SSq 13142.32 95.67588

Page 55: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 62

ANOVA table

• Note that the p-value is obtained using R.• As the p-value is less than 0.05, the null

hypothesis is rejected.• The expectation model E[Yi] 0 does not

provide as good a description of the data as the model E[Yi] 0 + 1x1i + 2x2i.

Source DF SSq MSq F p Regression 3 13142.32 4380.78 91.58 0.0108 Residual 2 95.68 47.84 Total 5 13238.00

Page 56: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 63

Testing that a subset of the expectation parameters are zero • A more useful test involves testing that just some

of the s are zero. • For example, in multiple linear regression you

might want to choose between the expectation models – E[Yi] 0 + 1x1i + 2x2i

– E[Yi] 0

• Again, we first state the null and alternative hypothesis for the hypothesis test. – H0: 1 2 0 (equivalent to E[Yi] 0)– H1: 1,2 0 (equivalent to E[Yi] 0 + 1x1i + 2x2i)

Page 57: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 64

Computing test statistic using ANOVA tableSource DF SSq MSq F p

Model0 1 G G

Model1 Model0 1q 1 1ˆ ˆ ψ ψ G G 2

Ms 2M2R

s

s

1, 2Pr n OF F

Residual n q ˆ ˆε ε 2Rs

Total n Y Y

2M 1 1ˆ ˆ 1s q ψ ψ G G and 2

R ˆ ˆs n q ε ε

1 1 1 1ˆ ˆ ˆ ˆ ψ ψ G G ψ G ψ G

• The Model SSq does not look like a SSq, but a difference. Can show

• Now the null model is not 0, but the grand mean model considered in Example I.3, Single sample.– Showed estimated expected values, 0, is vector G

Page 58: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 65

Factoring out Intercept term

• So unusual to test a hypothesis about a model that does not include the intercept term that the SSq for the model involving only it is usually subtracted out of the ANOVA.

• Form corrected Total SSq• Again, one can

– either subtract the grand mean from the observations and form the SSq

– or subtract the SSQ for the grand mean model from the uncorrected total SSq

• because Y Y G G Y G Y G

Page 59: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 66

Revised ANOVA table

• In this analysis, obtain SSqs of the following quantities:Model SSq: differences between the estimators of the

expected values for the two models in the hypotheses.Residual SSq: estimators of the errors obtained by

subtracting the estimators of the expected values under the alternative model from the random vector Y.

(Corrected) Total SSq: deviations from the grand mean obtained by subtracting the grand mean estimator from the random vector Y.

Source DF SSq MSq F p

Model1 Model0 (Model)

1q 1 1ˆ ˆ ψ G ψ G 2Ms 2

M2R

s

s

1, 2Pr n OF F

Residual n q ˆ ˆε ε 2Rs

(Corrected) Total 1n Y G Y G

2M 1 1ˆ ˆ 1s q ψ G ψ G and 2

R ˆ ˆs n q ε ε where nYG 1

Page 60: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 67

Example I.1 House price (continued) • Take previously computed fitted values and residuals

Observations y

Deviations y g

Fitted values ˆ aψ

Model differences ˆ aψ g

Residuals ˆ ay ψ

50 43.59116 -7.20884 6.408840 40 42.83241 -7.96759 -2.832413 52 53.55064 2.75064 -1.550645 47 52.60221 1.80221 -5.602210 65 61.42357 10.62357 3.576427

SSq 13238 13142.32 239.12409 95.67588

• Note that • Deviations are equal to sum of the Model differences and the

Residuals and • Sum of the last two sums of squares is approximately equal to the

Deviations sum of squares.

• Subtract from the response variable and the fitted values to obtain:

550.8g 1

Page 61: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 68

ANOVA table for the example

• As the p-value is greater than 0.05, the null hypothesis cannot be rejected.

• The expectation model E[Yi] 0 + 1x1i + 2x2i does not describe the data any better than the model E[Yi] 0.

• As the latter model is simpler, it will be adopted as the model that best describes the data.

Source DF SSq MSq F p Regression 2 239.124 119.562 2.50 0.2857 Residual 2 95.676 47.838 Total 4 334.800

Page 62: MATH 2016 (13177) Statistical Modelling

Statistical Modelling Chapter I 69

I.E Exercises

• There are exercises at the end of the chapter that review the material covered in this chapter.