today: feb 28

Today: Feb 28

• Reading Data from existing SAS dataset

• One-way ANOVA

• Reading Le 7:5

• Reading C&S 7:A-H

Reading SAS Datasets

LIBNAME tomhs 'c:/my documents/ph5415/';PROC CONTENTS DATA=tomhs.bpstudy;PROC PRINT DATA=tomhs.bpstudy (obs=10);RUN;

The libname statement tells SAS which directory (folder) the dataset is in.

DATA=tomhs.bpstudyTells SAS to look for a SAS dataset called bpstudy in the directory referenced by tomhs.

Sometimes your “raw” data is already a SAS dataset

PROC CONTENTS OUTPUTThe CONTENTS Procedure

Data Set Name: TOMHS.BPSTUDY Observations: 902Member Type: DATA Variables: 16Engine: V8 Indexes: 0Created: 9:07 Saturday, February 26, 2005 Observation Length: 128Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: 0

-----Alphabetic List of Variables and Attributes-----

# Variable Type Len Pos------------------------------------------ 3 AGE Num 8 16 6 CHOL12 Num 8 40 2 GROUP Num 8 8 8 HDL12 Num 8 56 9 PULSE12 Num 8 6410 PULSEBL Num 8 72 4 SBP12 Num 8 24 5 SBPBL Num 8 32 1 SEX Num 8 0 7 TRIG12 Num 8 4811 WT12 Num 8 8012 WTBL Num 8 8813 cholbl Num 8 9614 hdlbl Num 8 10416 id Char 6 12015 trigbl Num 8 112

PROC PRINT – 10 Observations

C T U U c t G S S H R H L L h h r R B B O I D S S W W o d i O S O A P P L G L E E T T l l g b E U G 1 B 1 1 1 1 B 1 B b b b i s X P E 2 L 2 2 2 2 L 2 L l l l d

1 1 3 54 . 139.5 . . . . 76 . 224.0 205 24 179 A00001

2 2 6 62 129 144.0 241 65 66 80 72 124.0 141.0 260 75 67 A00010

3 2 5 64 118 141.0 307 425 41 80 81 144.0 157.0 228 29 564 A00021

4 1 5 47 . 134.0 . . . . 80 . 214.0 194 66 49 A00023

5 1 3 51 . 132.5 . . . . 73 . 206.5 226 40 53 A00056

6 1 2 62 133 133.0 196 72 44 72 76 211.0 227.5 207 47 126 A00075

7 2 2 59 113 136.0 231 75 61 72 74 125.0 137.0 214 62 119 A00083

8 1 3 63 127 137.5 217 137 35 64 74 195.0 211.5 214 37 165 A00105

9 2 4 64 122 151.0 201 57 44 56 63 150.0 159.5 214 47 133 A00133

10 2 5 52 122 140.0 209 105 57 60 81 168.5 196.5 215 55 105 A00143

Reading a SAS DatasetDATA temp; SET tomhs.bpstudy; sbpdif = sbp12-sbpbl;PROC MEANS DATA=temp;

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum

SEX 902 1.3824834 0.4862633 1.0000000 2.0000000GROUP 902 3.7882483 1.7874130 1.0000000 6.0000000AGE 902 54.7727273 6.4039396 44.0000000 69.0000000SBP12 848 124.1002358 15.1891840 87.0000000 187.0000000SBPBL 902 140.3636364 12.4446043 113.5000000 190.0000000CHOL12 849 220.8386337 38.8624342 111.0000000 456.0000000TRIG12 849 106.9634865 62.5307082 24.0000000 592.0000000HDL12 849 45.4923439 12.1059688 18.0000000 102.0000000PULSE12 847 69.3506494 10.0301471 44.0000000 112.0000000PULSEBL 901 73.6925638 8.6698610 48.0000000 109.0000000WT12 848 176.8225236 30.4251368 105.5000000 286.0000000WTBL 902 187.3791574 31.0782720 113.0000000 289.2500000cholbl 900 228.2511111 38.4169684 113.0000000 357.0000000hdlbl 900 43.6122222 11.6124701 17.0000000 97.0000000trigbl 900 131.7366667 76.5211232 17.0000000 815.0000000sbpdif 848 -16.5176887 14.4532685 -75.5000000 30.0000000

Reads in an observation. Replaces the infile and input statements when reading in text data

One-Way Analysis of Variance

• Two-sample t-test; compare means of two groups– Are the means different?

• What if we have more than two groups?

Examples;• compare three different behavioral

interventions• compare 5 different BP drugs

Analysis of Variance

Could compare all pairs of means with t-tests

three groups: A-B, B-C, A-C

five groups:

A-B, A-C, A-D, A-E

B-C, B-D, B-E

C-D, C-E

D-E


Problem - multiple comparisons!!

When performing many tests, may reject null hypothesis by chance (Type I error)

With = 0.05, you allow for possibility of rejecting 1 out of 20 tests by chance

Even if all group means are equal then there is a fairly large chance that one-pair will be different


ANOVA simultaneously tests for difference in k means

• Y - continuous• k samples from k normal distributions

• each size ni, not necessarily equal

• each with possibly different mean• each with constant variance 2

i

Constant variance

ANOVA is robust for violations of constant variance (and normality)

Rule of thumb:

If largest standard deviation is less than twice the smallest standard deviation, you’re ok.

Can sometimes transform to achieve equal variance or normality


Ho: 1 = 2= ... = k

Ha: Not all i equal

For each group i;

ni = number of observations

= sample mean

= sample variance

= overall mean

iY2is

Y

Two-sample t-test is special case; k = 2

Sometimes referred to as a global or omnibus test

Two-sample T-test• Compared means

for two groups

• This compares variation between groups with variation within groups

21

21

11nn

s

yyt

p

Variation Within Groups

Variation Between Groups

ANOVA F-test• Compared means

for all groups

• This compares variation between groups with variation within groups

sF

Variation Within Groups

Variation Between Groups – Compared to Grand Mean 2)( YYi

p

2


Variation for all observations: 2)( YYij

Called the “(corrected) total sum of squares” or SST

Can be divided into two parts: •deviation of individual observation from its sample mean

• deviation of sample means from overall mean

)()( YYYYYY iiijij Similar to regression


)( YYi

)( iij YY Measures variation within samples

Measures variation between samples

Each has a corresponding “sum of squares”

2)( iij YY

2)( YYi

Sum of squares within (SSW)

Sum of squares between (SSB)

Analysis of VarianceEach has a corresponding degrees of freedom (DF)

SST = n-1 dfSSB = k-1 dfSSW = (n-1) - (k-1) = n-k df

Ratio of each sum of squares over its degrees of freedom gives us the mean squares

MSW = SSW / (n-k) = average variation within k samples

MSB = SSB / (k-1) = average variation between k samples

Analysis of VarianceMSW is estimate of the total variance, 2

MSW = SSW/(n-k)

SSW =

Sample variance for ith group,

2)( iij YY

1

)( 22

i

iiji n

YYs

22 )1()( iiiij snYYSSW

)1(

)1( 2

i

ii

n

snMSW = Pooled variance for k groups


The null hypothesis is tested by looking at F ratio:

F = MSB/MSW, compare to F distribution with k-1, n-k df

If variation between groups much greater than variationwithin groups;

F >> 1, reject null hypothesis

F 1, fail to reject null hypothesis


Results often presented in an ANOVA table

Source SS df MS F p-value

Between SSB k-1 MSB MSB/MSW p

Within SSW n-k MSW

Total SST n-1

SAS uses “Model” for “Between” and “Error” for “Within”

ANOVA in SAS; two ways

PROC ANOVA DATA = LIPID; CLASS diet; MODEL lipid = diet; RUN;

PROC GLM DATA = LIPID; CLASS diet; MODEL lipid = diet; RUN;

Both test for differencein mean lipid reductionfor the two diets

PROC ANOVA and GLM

• Almost exactly the same for this case

• GLM is a more general procedure

TOMHS Study

• 6 Treatment groups (Variable GROUP)– Beta-blocker– Calcium channel blocker– Diuretic– Alpha-blocker– ACE inhibitor– Placebo– All Treatments given lifestyle intervention to

lower BP

ANOVA – TOMHS Study

PROC GLM DATA=temp; CLASS group; MODEL sbpdif = group; MEANS group;RUN;

OUTPUTThe GLM Procedure

Class Level Information

Class Levels Values

GROUP 6 1 2 3 4 5 6

Number of observations 902

NOTE: Due to missing values, only 848 observations can be used in this analysis

Creates 5 dummy variables for you

GLM – OUTPUT

The GLM Procedure

Dependent Variable: sbpdif

Sum ofSource DF Squares Mean Square F Value Pr > F

Model 5 13149.8402 2629.9680 13.52 <.0001

Error 842 163785.8945 194.5201

Corrected Total 847 176935.7347

R-Square Coeff Var Root MSE sbpdif Mean

0.074320 -84.43703 13.94705 -16.51769

If H0 is true than F should be near 1

F = 2629.97/194.52

ANOVA TABLE

Pooled (over 6 groups) standard deviation

Estimates

GLM – OUTPUT

Source DF Type I SS Mean Square F Value Pr > F

GROUP 5 13149.84018 2629.96804 13.52 <.0001

Source DF Type III SS Mean Square F Value Pr > F

GROUP 5 13149.84018 2629.96804 13.52 <.0001

If no covariates are in the model this portion of the output will be the same as the ANOVA table because the model includes only GROUP.

The GLM Procedure

Level of ------------sbpdif-----------GROUP N Mean Std Dev

1 126 -20.0555556 15.34747172 121 -17.5289256 11.60806073 124 -21.8467742 14.49771184 129 -16.0697674 14.00052235 127 -17.6023622 13.18448746 221 -10.5950226 14.3539675

Contrasts

PROC GLM DATA=temp; CLASS group; MODEL sbpdif = group; MEANS group; ESTIMATE 'BB vs Placebo' group 1 0 0 0 0 -1 ; ESTIMATE 'CCB vs Placebo' group 0 1 0 0 0 -1 ; ESTIMATE 'Diur vs Placebo' group 0 0 1 0 0 -1 ; ESTIMATE 'AB vs Placebo' group 0 0 0 1 0 -1 ; ESTIMATE 'ACE vs Placebo' group 0 0 0 0 1 -1 ;RUN;The GLM Procedure OUTPUT


StandardParameter Estimate Error t Value Pr > |t|

BB vs Placebo -9.4605329 1.55691725 -6.08 <.0001CCB vs Placebo -6.9339030 1.57727142 -4.40 <.0001Diur vs Placebo -11.2517516 1.56489344 -7.19 <.0001AB vs Placebo -5.4747448 1.54534422 -3.54 0.0004ACE vs Placebo -7.0073396 1.55300848 -4.51 <.0001

Compare all Groups

PROC GLM DATA=temp; CLASS group; MODEL sbpdif = group; LSMEANS group/PDIF; RUN;

GLM – OUTPUTThe GLM Procedure Least Squares Means

sbpdif LSMEANGROUP LSMEAN Number

1 -20.0555556 12 -17.5289256 23 -21.8467742 34 -16.0697674 45 -17.6023622 56 -10.5950226 6

Least Squares Means for effect GROUP Pr > |t| for H0: LSMean(i)=LSMean(j)


i/j 1 2 3 4 5 6

1 0.1550 0.3103 0.0228 0.1622 <.0001 2 0.1550 0.0156 0.4087 0.9669 <.0001 3 0.3103 0.0156 0.0010 0.0161 <.0001 4 0.0228 0.4087 0.0010 0.3796 0.0004 5 0.1622 0.9669 0.0161 0.3796 <.0001 6 <.0001 <.0001 <.0001 0.0004 <.0001

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be use

P-value: Group 1 v Group 2

today: feb 28

Documents

sbp12 num

trigbl num

hdlbl num

sbpbl num

group num

wtbl num

cholbl num

pulsebl num