i. sample geometry and random sampling a.the geometry of the sample our sample data in matrix form...

57
I. Sample Geometry and Random Sampling A. The Geometry of the Sample Our sample data in matrix form looks like this: ' 11 12 1p 1 ' 21 22 2p 2 nxp ' n1 n2 np n x x x x x x x x X = = x x x x Separate multivar iate observat ions

Upload: victor-clark

Post on 23-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

I. Sample Geometry andRandom Sampling

A. The Geometry of the Sample

Our sample data in matrix form looks like this:

'11 12 1p 1

'21 22 2p 2

nxp

'n1 n2 np n

x x x xx x x x

X = =

x x x x

Separate multivariate observations

Page 2: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Just as the point where the population means of all p variables lies is the centroid of the population, the point where the sample means of all p variables lies is the centroid of the sample – for a sample with two variables and three observations:

11 12

21 22nxp

31 32

x x

X = x x

x x

we have x2

x1

x11,x12

x21,x22 x31,x32

x•1,x•2

row space centroid of the sample

in the p = 2 variable or ‘row’ space (because rows are treated as vector coordinates)

_ _

Page 3: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

These same data

11 12

21 22nxp

31 32

x x

X = x x

x x

3

2

x11,x21 ,x31

x12,x22 ,x32x1•,x2•,x3•

column space centroid of the sample

This is referred to as the ‘column’ space because columns are treated as vector coordinates)

plotted in item or ‘column’ space, would like like this:

_ _ _

Page 4: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Suppose we have the following data:

5 3X =

-3 11

In row space we have the following p = 2 dimensional scatter diagram

x2

x1

x11,x12

x21,x22

(1,7)row space centroid of the sample

with the obvious centroid (1,7).

Page 5: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

For the same data:

5 3X =

-3 11

In column space we have the following n = 2 dimensional plot

2

1x11,x21

x12,x22

(4,4)

row space centroid of the sample

with the obvious centroid (4,4).

Page 6: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Suppose we have the following data:

2 4 6

X = 1 7 1

-6 1 8

in row space we have the following p = 3 dimensional scatter diagram

x3

x1

x11,x12,x13

x21,x22,x23

(-1,4,5)

row space centroid of the sample

with the centroid (-1,4,5).x2

x31,x32,x33

Page 7: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

For the same data:

2 4 6

X = 1 7 1

-6 1 8

in column space we have the following n = 3 dimensional scatter diagram

3

1

with the centroid (4,3,1).2

x13,x23,x33

x12,x22,x32

x11,x21,x31

(4,3,1)

Page 8: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

The column space reveals an interesting geometric interpretation of the centroid – suppose we plot an n x 1 vector 1:

3

1

2

In n = 3 dimensions we

have:

~

1,1,1

This vector obviously forms equal angles with each of the n coordinate axes – this means normalization of this vector yields

11

n

Page 9: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Now consider some vector yi of coordinates (that represent various sample values of a random variable X).

3

1

2

~

In n = 3 dimensions we

have:

x1,x2,x3

Page 10: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

n

ijj=1

i i

x1 1

y 1 1 = 1 = x 1nn n

The projection of yi on the unit vector is given by

3

1

2

~

The sample mean xi corresponds to the multiple of 1 necessary to generate the projection of yi onto the line determined by 1!

In n = 3 dimensions we

have:

11

n

~1

ix 1

yi~

_~

~~

Page 11: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

i iy - x 1

Again using the Pythagorean Theorem, we can show that the length of the vector drawn perpendicularly from the projection of y onto 1 to y is .

3

1

2

~

This is often referred to as the deviation (or mean corrected) vector and is given by:

In n = 3 dimensions we

have:

~1

ix 1

yi~

i iy - x 1

1i i

2i ii i i

ni i

x - xx - xd = y - x 1 =

x - x

Page 12: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Example: Consider our previous matrix of three observations in three-space:

2 4 6

X = 1 7 1

-6 1 8

This data has a mean vector of:

-1

x = 4

5

i.e., x1 = -1.0, x2 = 4.0, and x3 = 5.0.___

Page 13: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

So we have

,

,

1

2

3

1 -1

x 1 = -1.0 1 = -1

1 -1

1 4

x 1 = 4.0 1 = 4

1 4

1 5

x 1 = 5.0 1 = 5

1 5

Page 14: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Consequently

,

,

1 1 1

2 2 2

3 3 3

2 -1 3

d = y - x 1 = 1 - -1 = 2

-6 -1 -5

4 4 0

d = y - x 1 = 7 - 4 = 3

1 4 -3

6 5 1

d = y - x 1 = 1 - 5 = -4

8 5 3

Note here that xi1 di i =1 ,…,p ._

~ ~

Page 15: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

So the decomposition is

,

,

1

2

3

2 -1 3

y = 1 = - -1 + 2

-6 -1 -5

4 4 0

y = 7 = 4 + 3

1 4 -3

6 5 1

y = 1 = 5 + -4

8 5 3

Page 16: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

We are particularly interested in the deviation vectors

, ,

1 2 3

3 0 1

d = 2 d = 3 d = -4

-5 -3 3

If we plot these deviation (or residual) vectors (translated to the origin without change in their lengths or directions)3

1

21d

2d

3d

Page 17: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Now consider the squared lengths of the deviation vectors:

i

n 22 'd i i ji i

j=1

L = d d = x - x

sum of the squared

deviations

squared length of deviation

vectorRecalling that the sample variance is:

n 2

ji ij=12

i

x - x

s =n

we can see that the squared length of a variable’s deviation vector is proportional to that variable’s variance (and so length is proportional to the standard deviation)!

Page 18: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Now consider any two deviation vectors. Their dot product is

n

' 'i k ji i jk k

j=1

d d = x - x x - x = x y

which is simply a sum of crossproducts.

Now let ik denote the angle between these two deviation vectors. Recall that

by substitution we have that

' '

xyx y

x y x ycos θ = =

L L x'x y'y

i k i k

i k

'' i ki k d d d d ik

d d

d dd d = L L = L L cos θ

L L

Page 19: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Another substitution based on

n

'i k ji i jk k

j=1

d d = x - x x - x

and

n n n2 2

ji i jk k ji i jk k ikj=1 j=1 j=1

x - x x - x = x - x x - x cos θ

i

n 22 'd i i ji i

j=1

L = d d = x - x

yields

Page 20: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

A little algebra gives us

n

ji i jk kj=1

ik n n2 2

ji i jk kj=1 j=1

n

ji i jk kj=1

n n2 2

ji i jk kj=1 j=1

ikik

ii kk

x - x x - x

cos θ =

x - x x - x

x - x x - x

n

=

x - x x - x

n n

s = = r

s s

Page 21: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Example: Consider our previous matrix of three observations in three-space:

2 4 6

X = 1 7 1

-6 1 8

which resulted in deviation vectors

, ,

1 2 3

3 0 1

d = 2 d = 3 d = -4

-5 -3 3

Let’s use these results to find the sample covariance and correlation matrices.

Page 22: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

,

,

'1 1 11 11

'2 2 22 22

'3 3 33 33

338

d d = 3 2 -5 2 = 38 = 3s s =3

-5

018

d d = 0 3 -3 3 = 18 = 3s s =3

-3

126

d d = 1 -4 3 -4 = 26 = 3s s =3

3

We have:

Page 23: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

,

,

'1 2 12 12

'1 3 13 13

'2 3 23 23

021

d d = 3 2 -5 3 = 21 = 3s s =3

-3

120

d d = 3 2 -5 -4 = -20 = 3s s = -3

3

121

d d = 0 3 -3 -4 = -21 = 3s s = -3

3

and:

Page 24: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

so:

1212

11 22

1313

11 33

2323

22 33

21 3sr = = = 0.803,

s s 38 3 18 3

20 3sr = = = -0.636,

s s 38 3 26 3

-21 3sr = = = -0.971

s s 18 3 26 3

n

38 21 -203 3 3 1.000 0.803 -0.63621 18 -21

S = and R = 0.803 1.000 -0.9713 3 3

-0.636 -0.971 1.000-20 -21 263 3 3

which gives us

Page 25: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

B. Random Samples and the Expected Values of and

Suppose we intend to collect n sets of measurements (or observations) on p variables. At this point we can consider each of the n x p values to be observed to be random variables Xjk. This leads to interpretation of each set of measurements Xj on the p variables to be a random vector, i.e.,

~ ~

'11 12 1p 1

'21 22 2p 2

nxp

'n1 n2 np n

x x x xx x x x

X = =

x x x x

Separate multivariate observations

~

These concepts will be used to define a random sample.

Page 26: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Random Sample – if the row vectors -represent independent observation -from a common joint probability distribution

then

, , ,' ' '1 2 nX X X

This means the observations have a joint density function of

, , ,1 2 pf X = f x x x

, , ,' ' '1 2 nX X X

are said to form a random sample from .

f X

, , ,

n

jj=1

j j1 j2 jp

f x

f x = f x x x

th

where is the

density function for the j row vector.

Page 27: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

will usually be correlated. The measurements from different trials, however, must be independent for inference to be valid.

-Independence of the measurements from different trials is often violated when the data have a serial component.

Keep in mind two thoughts with regards to random samples

-The measurements of the p variables in a single trial

, , , 'j j1 j2 jpX = X X X

Page 28: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

be a random sample from a joint distribution with mean vector and covariance matrix . Then:

-X is an unbiased estimate of , i.e., E(X) = , and

has a covariance matrix

-the sample covariance matrix Sn has expected value

Note that and have certain properties no matter what the underlying joint distribution of random variables is.

Let

~~

, , ,1 2 nX X X

~ ~

~

_~ ~ ~

1Σ = Cov X

n

n

n - 1 1E S = Σ = Σ - Σ

n ni.e., Sn is a biased estimator of covariance matrix , but

~

~ ~

n n

n nE S = E S = Σ

n - 1 n - 1

bias

Page 29: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

whose (i,k)th element is

This means we can write an unbiased sample variance covariance matrix S as

~

'

n

n j jj=1

n 1S = S = X - X X - X

n - 1 n - 1

'n

ik ji i jk kj=1

1s = X - X X - X

n - 1

Page 30: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

the unbiased estimate S is

Example: Consider our previous matrix of three observations in three-space:

n

38 21 -20 38 21 -203 3 3 2 2 2

n 3 21 18 -21 21 18 -21S = S = =

n - 1 3 - 1 3 3 3 2 2 2-20 -21 26-20 -21 262 2 23 3 3

2 4 6

X = 1 7 1

-6 1 8

~

Page 31: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Why?

Notice that this does not change the sample correlation matrix R!

1.000 0.803 -0.636

R = 0.803 1.000 -0.971

-0.636 -0.971 1.000

Page 32: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

C.Generalizing Variance over P Dimensions

For a given variance-covariance matrix

'

11 12 1p

12 22 2p

nxp

1p 2p pp

n

ik ji i jk kj=1

s s s

s s sS =

s s s

1 = s = X - X X - X

n - 1

the Generalized Sample Variance is |S|.~

Page 33: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Example: Consider our previous matrix of three observations in three-space:

the Generalized Sample Variance is

2 4 6

X = 1 7 1

-6 1 8

7 1 1 1 1 7X = 2 -4 +61 8 -6 8 -6 1

= 2 55 -4 14 +6 43 = 312

Page 34: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Of course, some of the information regarding the variances and covariances is lost when summarizing multiple dimensions with a single number.

Consider the geometry of |S| in two dimensions - we will generate two deviation variables

~

2dL

1dL

1d

2d

1dHeight = L sin

This resulting trapezoid has area .

1 2d dL sin θ L

Page 35: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Because sin2() + cos2() = 1, we can rewrite the area of this trapezoid as

Earlier we showed that

1 2 1 2

2d d d dL sin θ L = L L 1-cos θ

and cos() = r12.

1

2

n 2

d j1 1 11j=1

n 2

d j2 2 22j=1

L = X - X = n-1 s

L = X - X = n-1 s

Page 36: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

So by substitution

and we know that

1 2 1 2

2d d d d

211 22 12

211 22 12

L sin θ L = L L 1-cos θ

= n-1 s s 1-r

= n-1 s s 1-r

So .

11 11 22 1211 12

21 22 11 22 12 22

2 211 22 11 22 12 11 22 12

s s s rs sS = =

s s s s r s

= s s - s s r = s s 1 - r

2

2

areaS =

n - 1

Page 37: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

More generally, we can establish the Generalized Sample Variance to be

2

p

volumeS =

n - 1which simply means that the generalized sample variance (for a fixed set of data) is proportional to the squared volume generated by its p deviation vectors.

Note that

-the generalized sample variance increases as any deviation vector increases in length (the corresponding variable increases in variation)

-the generalized sample variance increases as the direction of any two deviation vector becomes more dissimilar (the correlation of the corresponding variables decreases)

Page 38: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

3

1

2

2d

1d3d

Here we see the generalized sample variance changes as the length of deviation vector d2 changes (the variation of the corresponding variable changes):

deviation vector d2 increases in length to cd2 , c > 1 (i.e., the variance of x2

increases)

3

1

2

2cd

1d3d

Page 39: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

3

1

2

2d

1d3d

3

1

2

2d

1d

3d

Here we see the generalized sample variance decrease as the direction of deviation vectors d2 and d3 become more similar (the correlation of x2 and x3 increases):

23 = 900, i.e., deviation vectors d2 and d3 are orthogonal (x2 and x3

are not correlated

00< 23 < 900, i.e., deviation vectors d2 and d3 move in

similar directions (x2 and x3 are positively correlated

Page 40: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

This suggests an important result - the generalized sample variance is zero when and only when at least one deviation vector lies in the span of other deviation vectors, i.e., when.

one deviation vector is a linear combination of some other deviation vectors

one variable is perfectly correlated with a linear combination of other variables

the rank of the data is less than the number of columns

the determinant of the variance-covariance matrix is zero

Page 41: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

These results also suggests simple conditions for determining if S is of full rank:

-If n p then |S| = 0

-For the p x 1 vectors x1, x2, …, xp representing realizations of independent random vectors X1, X2, …, Xp, where xj

’ is the jth row of data matrix X

~

~

~

if the linear combination a’Xj has positive variance for each constant vector a 0 and p < n, S is of full rank and |S| > 0

if a’Xj is a constant j, then |S| = 0

~ ~~ ~

~~~

~ ~~ ~

~~

~~ ~

Page 42: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Generalized Sample Variance also has a geometric interpretation in the p-dimensional scatter plot representation of the data in row space.

Consider the measure of distance of each point in row space from the sample centroid

1

2

p

x

xx =

x

with S-1 substituted for A.

Under these circumstances, the coordinates x’ that lie a constant distance c from the centroid must satisfy

~ ~

~

' -1 2x - x S x - x = c

Page 43: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

A little integral calculus can be used to show that the volume of this ellipsoid is

p2

p

2k =

pp

2

Thus, the squared volume of the ellipsoid is equal to the product of some constant and the generalized sample variance.

' -1 2 p

pvolume x : x - x S x - x c = k Sc

where

Page 44: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Example: Here we have three data sets, all with centroid (3, 3) and generalized variance |S| = 9.0: Data Set A

1 2

5 4 9 1 2 1 2S = ,r = 0.80 λ = ,e = e =

4 5 1 1 2 -1 2→

Scatter Plot

x1

x2

x1 x2

4.64 0.763.64 1.632.90 2.096.64 0.337.14 0.73-0.60 7.012.14 2.700.40 5.522.40 2.38-0.10 8.274.14 3.456.14 0.003.14 1.871.40 4.800.90 5.685.14 0.242.64 3.595.64 3.011.64 2.631.14 3.431.90 2.88

~

Page 45: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

1 2

3 0 1 1 0S = ,r = 0.00 λ = ,e = e =

0 3 0 0 1→

Data Set B

Scatter Plot

x1

x2

x1 x2

6.34 1.541.19 2.753.69 1.742.34 1.734.84 5.355.84 4.882.19 3.652.69 3.241.34 1.603.34 4.344.34 2.030.84 1.075.34 3.410.34 1.373.19 1.761.84 6.692.84 1.973.84 2.064.19 1.881.69 2.990.69 6.94

Page 46: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

1 2

5 -4 9 1 2 1 2S = ,r = -0.80 λ = ,e = e =

-4 5 1 -1 2 1 2→

Data Set C

Scatter Plot

x1

x2

x1 x2

-0.10 0.653.64 3.525.64 7.635.14 6.241.40 1.752.64 2.641.90 1.811.64 1.764.64 3.700.90 1.312.90 2.141.14 1.326.14 8.616.64 4.144.14 3.962.14 2.203.14 3.087.14 3.782.40 1.970.40 1.14-0.60 -0.36

Page 47: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Other measures of Generalized Variance have been suggested based on:

-the variance-covariance matrix of the standardized variables, i.e., |R|

-total sample variance, i.e.,

p

11 22 pp iii=1

s + s + + s = s

~ignores differences in variances

of individual variables

ignores pairwise correlations between variables

Page 48: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

'1

11 12 n1'2

12 22 n2

1p 2p np'p

y 1n x x x 1

y 1 x x x 11 1x = = = X'1n

n nx x x 1

y 1n

D. Matrix Operations for Calculating Sample Means, Covariances, and Correlations

For a given data matrix X

11 12 1p

21 22 2p1 2 p

n1 n2 np

x x x

x x xX = = y y y

x x x

we have that

~

Page 49: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

1 2 p

1 2 p' '

1 2 p

x x x

x x x11x = 11 X =

nx x x

We can also create a n x p matrix of means

If we subtract this result from data matrix X we have

11 1 12 2 1p p

21 1 22 2 2p p'

n1 1 n2 2 np p

x - x x - x x - x

x - x x - x x - x1X - 11 X =

nx - x x - x x - x

which is an n x p matrix of deviations!

~

Page 50: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

'' '

11 1 21 1 n1 1

12 2 22 2 n2 2

1p p 2p p np p

11 1 12 2 1p p

21 1 22 2 2p p

1 1n - 1 S = X - 11 X X - 11 X

n n

x - x x - x x - x

x - x x - x x - x =

x - x x - x x - x

x - x x - x x - x

x - x x - x x - x

x

n1 1 n2 2 np p

' '

x - x x - x x - x

1 = X I - 11 X

n

Now the matrix (n – 1)S of sums of squares and crossproducts is

~

Page 51: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

' '1 1S = X I - 11 X

n - 1 n

So the unbiased sample variance-covariance matrix S is

~

Similarly, the common biased sample variance-covariance matrix Sn is~

' 'n

1 1S = X I - 11 X

n n

Page 52: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

If we substitute zeros for the off-diagonal elements of the variance-covariance matrix Sn and take the square root of each element of the resulting matrix, we get the standard deviation matrix

~

11

221 2

pp

s 0 0

0 s 0D =

0 0 s

whose inverse is

11

1 222

pp

10 0

s

10 0

sD =

10 0

s

Page 53: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

Now since

and

11 12 1p

12 22 2p

1p 2p pp

s s s

s s sS =

s s s

1p11 12

11 11 11 22 11 pp

2p12 22

11 22 22 22 22 pp

pp1p 2p

pp pp

ss s

s s s s s s

ss s

s s s s s sR =

ss s

s s

Page 54: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

we have

-1 2 -1 2R = D SD

which can be manipulated to show that the sample variance-covariance matrix S is a function of the sample correlation matrix R:

~~

1 2 1 2S = D RD

Page 55: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

E.Sample Values of Linear Combinations of Variables

For some linear combination of p variables

p'

i ii=1

c X = c X

the n derived observations have

whose observed value on the jth trial is

,

p'

j ji jii=1

c x = c x j = 1,...,n

'

'

sample mean = c x,

sample variance = cSc

Page 56: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

If we have a second linear combination of these p variables

p'

i ii=1

b X = b X

the the two linear combinations have

whose observed value on the jth trial is

,

p'

j ji jii=1

b x = b x j = 1,...,n

' 'sample variance - covariance = bSc = cSb

Page 57: I. Sample Geometry and Random Sampling A.The Geometry of the Sample Our sample data in matrix form looks like this: Separate multivariate observations

If we have a q x p matrix A whose kth row contains the coefficients of a linear combinations of these p variables, then these q linear combinations have

'

sample mean = Ax,

sample variance - covariance = ASA