generating correlated random variables kriss harris senior statistician kriss.5.harris@gsk.com

Post on 14-Dec-2015

229 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Generating Correlated Random Variables

Kriss HarrisSenior Statistician

Kriss.5.Harris@gsk.com

Why?

• I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables.

2

Previous Method

3

Use Excel to fill down and then generate

another column that was fairly correlated

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

4

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

5

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

6

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

7

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

8

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

9

Y and x for different correlation coefficients

10

Generating Correlated Random Variables using Proc IML

• To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML.

• IML = Interactive Matrix Language

11

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

12

Use is similar to set.Reading in the simulated data and the means

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

13

Variance covariance matrix

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

14

Applying Cholesky’s decompositon

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

15

Concatenating the variables

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

16

Correlated Variables

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

17

Outputting the variables

References

• Generating Multivariate Normal Data by using Proc IMLLingling Han, University of Georgia, Athens, GA

18

Appendix

• Correlation Coefficient =

19

R Code - Generating Correlated Random Variables

mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9

r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)

y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2;

20

R Code - Generating Correlated Random Variables

mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9

r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)

y1 = mean1 + sig1*r1y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2

21

R Code - Generating Correlated Random Variables using Matrices

C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2)cholc = chol(C)R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow

= F)mean = matrix(c(mean1,mean2), nrow = 100,

ncol = 2, byrow = T)RC = mean + R %*% cholc

22

Use previous values of r1 and r2

top related