marginal and conditional distributions. theorem: (marginal distributions for the multivariate normal...

53
Marginal and Conditional distributions

Upload: sara-mason

Post on 30-Dec-2015

229 views

Category:

Documents


1 download

TRANSCRIPT

Marginal and Conditional distributions

1

2

Let q

p q

xx

x

Theorem: (Marginal distributions for the Multivariate Normal distribution)

11 12

12 22

have p-variate Normal distribution

with mean vector1

2

q

p q

and Covariance matrix

Then the marginal distribution of is qi-variate Normal distribution (q1 = q, q2 = p - q)

iiwith mean vector i

and Covariance matrix

ix

1

2

Let q

p q

xx

x

Theorem: (Conditional distributions for the Multivariate Normal distribution)

11 12

12 22

have p-variate Normal distribution

with mean vector1

2

q

p q

and Covariance matrix

Then the conditional distribution of given is qi-variate Normal distribution

1- ii j ii ij jj ij

with mean vector 1 = i j i ij jj j jx

and Covariance matrix

ix

jx

12 1 22 12 11 12The matrix

is called the matrix of partial variances and covariances.

th

2 1The , element of the matrix i j

1,2....ij q

is called the partial covariance (variance if i = j) between xi and xj given x1, … , xq.

1,2....1,2....

1,2.... 1,2....

ij qij q

ii q jj q

is called the partial correlation between xi and xj given x1, … , xq.

112 11the matrix

is called the matrix of regression coefficients for predicting xq+1, xq+2, … , xp from x1, … , xq.

12 1 1 2 12 11 1 where x

Mean vector of xq+1, xq+2, … , xp given x1, … , xqis:

Example:

10

15 and

6

14

Suppose that

1

2

3

4

x

xx

x

x

Is 4-variate normal with

4 2 4 2

2 17 6 5

4 6 14 6

2 5 6 7

1

10 and

15

The marginal distribution of1

12

xx

x

is bivariate normal with

11

4 2

2 17

1

10

15 and

6

The marginal distribution of1

1 2

3

x

x x

x

is trivariate normal with

11

4 2 4

2 17 6

4 6 14

Find the conditional distribution of

11

2

15

5

xx

x

given

11

4 2

2 17

1 2

10 6 and

15 14

Now

and

32

4

xx

x

22

14 6

6 7

12

4 2

6 5

12 1 22 12 11 12

114 6 4 6 4 2 4 2

6 7 2 5 2 17 6 5

9 3

3 5

112 11

The matrix of regression coefficients for predicting x3, x4 from x1, x2.

14 6 4 2

2 5 2 17

0.875 .250

0.375 .250

1 22 1 1

1 2

0.875 0.250 6.5

0.375 0.250 6.5

x xx

x x

6 0.875 0.250 10

14 0.375 0.250 15

6.5

6.5

12 12 11 1

2 1

0.875 15 0.250 5 6.5 7.875

0.375 15 0.250 5 6.5 13.375

Thus the conditional distribution of

11

2

15

5

xx

x

given

is bivariate Normal with mean vector

And partial covariance matrix

32

4

xx

x

2 1

7.875

13.375

2 1

9 3

3 5

Using SPSS

Note: The use of another statistical package such as Minitab is similar to using SPSS

The first step is to input the data.

The data is usually contained in some type of file.

1. Text files

2. Excel files

3. Other types of files

After starting the SSPS program the following dialogue box appears:

If you select Opening an existing file and press OK the following dialogue box appears

Once you selected the file and its type

The following dialogue box appears:

If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range:

Once you “click OK”, two windows will appear

A window containing the output

The other containing the data:

To perform any statistical Analysis select the Analyze menu:

To compute correlations select Correlate then BivariateTo compute partial correlations select Correlate then Partial

for Bivariate correlation the following dialogue appears

Correlations

1.000 .080 .253** .372** -.069 .009 .210**

. .281 .001 .000 .357 .899 .004

183 183 183 183 183 183 183

.080 1.000 .481** -.007 -.013 .147* .106

.281 . .000 .930 .863 .047 .153

183 183 183 183 183 183 183

.253** .481** 1.000 .131 -.235** .072 .291**

.001 .000 . .078 .001 .330 .000

183 183 183 183 183 183 183

.372** -.007 .131 1.000 .075 .269** .294**

.000 .930 .078 . .313 .000 .000

183 183 183 183 183 183 183

-.069 -.013 -.235** .075 1.000 .454** .039

.357 .863 .001 .313 . .000 .603

183 183 183 183 183 183 183

.009 .147* .072 .269** .454** 1.000 .178*

.899 .047 .330 .000 .000 . .016

183 183 183 183 183 183 183

.210** .106 .291** .294** .039 .178* 1.000

.004 .153 .000 .000 .603 .016 .

183 183 183 183 183 183 183

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

AGE

HT

WT

CHL

ALB

CA

UA

AGE HT WT CHL ALB CA UA

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

the output for Bivariate correlation:

for partial correlation the following dialogue appears

- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -Controlling for.. AGE HT WT CHL ALB CA UACHL 1.0000 .1299 .2957 .2338 ( 0) ( 178) ( 178) ( 178) P= . P= .082 P= .000 P= .002ALB .1299 1.0000 .4778 .1226 ( 178) ( 0) ( 178) ( 178) P= .082 P= . P= .000 P= .101CA .2957 .4778 1.0000 .1737 ( 178) ( 178) ( 0) ( 178) P= .000 P= .000 P= . P= .020UA .2338 .1226 .1737 1.0000 ( 178) ( 178) ( 178) ( 0) P= .002 P= .101 P= .020 P= .(Coefficient / (D.F.) / 2-tailed Significance)" . " is printed if a coefficient cannot be computed

the output for partial correlation:

Correlations

1.000 .080 .253** .372** -.069 .009 .210**

. .281 .001 .000 .357 .899 .004

183 183 183 183 183 183 183

.080 1.000 .481** -.007 -.013 .147* .106

.281 . .000 .930 .863 .047 .153

183 183 183 183 183 183 183

.253** .481** 1.000 .131 -.235** .072 .291**

.001 .000 . .078 .001 .330 .000

183 183 183 183 183 183 183

.372** -.007 .131 1.000 .075 .269** .294**

.000 .930 .078 . .313 .000 .000

183 183 183 183 183 183 183

-.069 -.013 -.235** .075 1.000 .454** .039

.357 .863 .001 .313 . .000 .603

183 183 183 183 183 183 183

.009 .147* .072 .269** .454** 1.000 .178*

.899 .047 .330 .000 .000 . .016

183 183 183 183 183 183 183

.210** .106 .291** .294** .039 .178* 1.000

.004 .153 .000 .000 .603 .016 .

183 183 183 183 183 183 183

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

AGE

HT

WT

CHL

ALB

CA

UA

AGE HT WT CHL ALB CA UA

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Compare these with the bivariate correlation:

CHL ALB CA UACHL 1.0000 .1299 .2957 .2338 ALB .1299 1.0000 .4778 .1226 CA .2957 .4778 1.0000 .1737 UA .2338 .1226 .1737 1.0000

Partial Correlations

Bivariate Correlations

In the last example the bivariate and partial correlations were roughly in agreement.

This is not necessarily the case in all stuations

An Example:

The following data was collected on the following three variables:

1. Age

2. Calcium Intake in diet (CAI)

3. Bone Mass density (BMI)

The data

Age CAI BMI Age CAI BMI Age CAI BMI

25 75.2 147.7 45 62.5 239.8 65 66.8 298.925 83.6 166.7 45 84.6 257.5 65 53.5 280.625 112.2 254.9 45 107 317.2 65 64.9 287.225 99.8 193 45 82.3 280.3 65 63.8 302.225 93.1 199 45 69.9 232.8 65 52.6 263.325 97.7 202.6 45 100.6 270.5 65 58.8 296.225 103.7 231.7 45 74.1 228.6 65 61.4 294.225 101.8 199.7 45 60.2 231.8 65 59.6 294.325 99.6 182.4 45 94.6 252.5 65 62.9 250.425 94.9 202.8 45 80.7 254.9 65 52 26525 99.6 204.7 45 94.4 266.3 65 60.4 267.625 100.2 206.6 45 73.1 227.9 65 61.2 287.325 116.9 280 45 81.2 245 65 67.4 299.825 97.3 186.9 45 106.1 297.8 65 51.5 273.225 98.8 217.9 45 79.3 217.7 65 60.7 284.225 90.6 198.7 45 85.1 263.9 65 56.3 290.225 101.7 190.4 45 81.9 280.5 65 72.9 306.725 98.6 221.3 45 98.7 281 65 40.3 258.825 93 191.8 45 89.1 275.4 65 47.1 283.625 108.1 216.2 45 71.6 225.2 65 76.9 323.525 78.9 161.3 45 76.9 240.9 65 64.7 303.425 87.1 188.6 45 79.7 252.1 65 59.9 297.935 96 248 55 61.1 238.1 75 37.8 277.935 97 261 55 60.5 250.1 75 43 287.535 94.8 237.5 55 82 285.9 75 33.8 305.935 78.1 225.8 55 70.7 267.1 75 41.5 320.335 93.1 239.7 55 71.9 258.2 75 71.2 353.335 74.3 205.7 55 64.4 245.3 75 58 34535 100.1 255.7 55 66.3 284.9 75 53.4 325.535 95 244.3 55 64.3 281.7 75 37.5 268.535 77 202.3 55 59 249.9 75 41.6 312.135 94.5 231.6 55 83.8 318.1 75 50.9 282.835 108.6 288.2 55 68.9 274.4 75 57.5 346.935 92.4 221.2 55 77.8 266.6 75 51.8 32335 104.3 262 55 63.9 274.9 75 64.9 343.835 87 218.7 55 75.8 277.6 75 44 299.235 88.6 232.3 55 78.8 291.5 75 49.4 313.435 97 252.7 55 82.6 302.8 75 54.4 294.935 85 213.8 55 65.4 270.7 75 53.9 321.335 96.1 232.8 55 59.5 231.3 75 48.7 26235 111.2 288.6 55 54.4 239 75 51.4 325.735 83.9 230.3 55 56.6 247.8 75 49.2 345.535 100.1 248.2 55 55.6 219.9 75 36.5 260.935 91.2 258.6 55 64 249.2 75 61 333.8

Correlations

1.000 -.863** .800**

. .000 .000

132 132 132

-.863** 1.000 -.447**

.000 . .000

132 132 132

.800** -.447** 1.000

.000 .000 .

132 132 132

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

AGE

CAI

BMI

AGE CAI BMI

Correlation is significant at the 0.01 level (2-tailed).**.

Bivariate correlations

- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - - Controlling for.. AGE CAI BMI CAI 1.0000 .8057 ( 0) ( 129) P= . P= .000 BMI .8057 1.0000 ( 129) ( 0) P= .000 P= . (Coefficient / (D.F.) / 2-tailed Significance)

Partial correlations

0

50

100

150

200

250

300

350

400

0 50 100 150

Ca intake

BM

IScatter plot CAI vs BMI

(r = -0.447)

0

50

100

150

200

250

300

350

400

0 50 100 150

Ca Intake

BM

I

25 35 45 55 65 75

3D Plot

Age, CAI and BMI

Transformations Theorem

Let x1, x2,…, xn denote random variables with joint probability density function

f(x1, x2,…, xn )

Let u1 = h1(x1, x2,…, xn).u2 = h2(x1, x2,…, xn).

un = hn(x1, x2,…, xn).

define an invertible transformation from the x’s to the u’s

Then the joint probability density function of u1, u2,…, un is given by:

11 1

1

, ,, , , ,

, ,n

n nn

d x xg u u f x x

d u u

1, , nf x x J

where

1

1

, ,

, ,n

n

d x xJ

d u u

Jacobian of the transformation

1 1

1

1

detn

n n

n

dx dx

du du

dx dx

du du

ExampleSuppose that x1, x2 are independent with density functions f1 (x1) and f2(x2)

Find the distribution of

u1 = x1+ x2

u2 = x1 - x2

Solving for x1 and x2 we get the inverse transformation

1 21 2

u ux

1 22 2

u ux

1 2

1 2

,

,

d x xJ

d u u

The Jacobian of the transformation

1 1

1 2

2 2

1 2

det

dx dx

du du

dx dx

du du

1 11 1 1 1 12 2det

1 1 2 2 2 2 2

2 2

The joint density of x1, x2 is

f(x1, x2) = f1 (x1) f2(x2)

Hence the joint density of u1 and u2 is:

1 2 1 21 2

1

2 2 2

u u u uf f

1 2 1 2, ,g u u f x x J

Theorem

Let x1, x2,…, xn denote random variables with joint probability density function

f(x1, x2,…, xn )

Let u1 = a11x1+ a12x2+…+ a1nxn + c1u2 = a21x1 + a22x2+…+ a2nxn + c2

un = an1 x1+ an2 x2 +…+ annxn + cn

define an invertible linear transformation from the x’s to the u’s

1 or u Ax c x A u c

Then the joint probability density function of u1, u2,…, un is given by:

1 1

1, , , ,n ng u u f x x

A

1 1f A u c

A

where11 1

1

detn

n nn

a a

A

a a

Theorem

Suppose that The random vector, [x1, x2, … xp] has a p-variate normal distribution with mean vector and covariance matrix

x

u A c

then

has a p-variate normal distribution

with mean vector

and covariance matrix u A A

u Ax c

Theorem

Suppose that The random vector, [x1, x2, … xp] has a p-variate normal distribution with mean vector and covariance matrix

x

u A c

then

has a p-variate normal distribution

with mean vector

and covariance matrix u A A

u Ax c

Proof

11

2/ 2 1/ 2

1e

2

x x

pf x

then 1 1 g u f A u c

A

1 1 11

2/ 2 1/ 2

1 1e

2

A u c A u c

p A

1 1 11

2/ 2 1/ 2

1 1e

2

u c A A A u c A

p A

since

1/ 2 A A A A

11

2/ 2 1/ 2

1e

2

u A c A A u A c

pg uA A

1 1 A u c A u c A

Also

1/ 21/ 2 1/ 2= A A A A A and

11 1 1 A A A A and

hence

QED

Theorem

Suppose that The random vector,

has a p-variate normal distribution with mean vector and covariance matrix

x

Ax A

with mean vector

and covariance matrix Ax A A

then has a p-variate normal distributionAxLet A be a q p matrix of rank q ≤ p

proof

u

AC

B

then

is invertible.

and covariance matrix

, = u

A A A A BA B

B B A B B

A Axu Cx x

B Bx

Let B be a (p - q) p matrix so that

AC

B

is p–variate normal with mean vector

AThus the marginal distribution of

and covariance matrix A A

Ax

is q–variate normal with mean vector