lesson 8 - topics creating sas datasets from procedures using ods and data steps to make reports...
TRANSCRIPT
Lesson 8 - Topics
• Creating SAS datasets from procedures• Using ODS and data steps to make reports • Using PROC RANK• Programs 14-15 in course notes
• LSB 4:11;5:3
Making SAS Datasets From Procedures
Output from SAS PROCs can be put into SAS datasets:
1. To do further processing of the information from the output
2. To reformat output to make a report
3. To restructure original SAS dataset or create new variables
Ways to Put Output into SAS Datasets
• Using OUTPUT statement available from many procedures
• Using ODS OUTPUT statement – any output table can be put into a SAS dataset
Report We Want to Generate
Quartiles of Weight by Gender and Center
sex clinic N P25 P50 P75
Male A 9 180.0 190.0 208.0 Male B 16 158.3 174.8 218.3 Male C 29 178.0 199.5 220.5 Male D 11 172.0 184.5 194.0 Female A 6 125.0 143.5 160.5 Female B 9 150.0 164.5 184.0 Female C 6 132.5 134.3 138.5 Female D 6 131.0 137.5 148.5
Program 14
LIBNAME class ‘C:\SAS_Files';
* Will use SAS dataset version of TOMHS data;DATA wt; SET class.tomhsp (KEEP=ptid age sex clinic wtbl wt12 ); wtchg = wt12 - wtbl;RUN;
PROC FORMAT; VALUE sexF 1 = ‘Male’ 2=‘Female’;RUN;
Create report by sex and clinic of univariate info;
PROC SORT DATA = wt; BY sex clinic;
PROC UNIVARIATE DATA = wt NOPRINT; BY sex clinic; VAR wt12 ; OUTPUT OUT=univinfo N = n Q1 = p25 MEDIAN = p50 Q3 = P75 ;
Dataset univinfo will have one observation for each combination of sex and clinic.
Statistic name = variable name
Name of new dataset
PROC PRINT DATA = univinfo; FORMAT sex sexF.;RUN;
Obs sex clinic n p75 p50 p25
1 Male A 9 208.00 190.00 180.00 2 Male B 16 218.25 174.75 158.25 3 Male C 29 220.50 199.50 178.00 4 Male D 11 194.00 184.50 172.00 5 Female A 6 160.50 143.50 125.00 6 Female B 9 184.00 164.50 150.00 7 Female C 6 138.50 134.25 132.50 8 Female D 6 148.50 137.50 131.00
PROC PRINT DATA = univinfo NOOBS; VAR sex clinic n p25 p50 p75; FORMAT p25 p50 p75 6.1 ;TITLE 'Quartiles of Weight by Gender/Center';RUN;
Quartiles of Weight by Gender/Center
sex clinic N P25 P50 P75
Male A 9 180.0 190.0 208.0Male B 16 158.3 174.8 218.3Male C 29 178.0 199.5 220.5Male D 11 172.0 184.5 194.0Female A 6 125.0 143.5 160.5Female B 9 150.0 164.5 184.0Female C 6 132.5 134.3 138.5Female D 6 131.0 137.5 148.5
* Output quantile table to a dataset;ODS OUTPUT quantiles = qwt;PROC UNIVARIATE DATA = wt ; VAR wtbl wt12 ;RUN;ODS OUTPUT CLOSE ;
PROC PRINT DATA=qwt;RUN;
Using ODS to Send Output to a SAS Dataset
Syntax:
ODS OUTPUT output-table = new-data-set;
Obs Varname Quantile Estimate
1 wtbl 100% Max 279.30 2 wtbl 99% 274.15 3 wtbl 95% 246.40 4 wtbl 90% 237.40 5 wtbl 75% Q3 215.15 6 wtbl 50% Median 192.65 7 wtbl 25% Q1 165.90 8 wtbl 10% 141.50 9 wtbl 5% 137.40 10 wtbl 1% 130.25 11 wtbl 0% Min 128.50 12 wt12 100% Max 271.50 13 wt12 99% 271.50 14 wt12 95% 239.00 15 wt12 90% 227.00 16 wt12 75% Q3 202.50 17 wt12 50% Median 180.00 18 wt12 25% Q1 153.50 19 wt12 10% 133.00 20 wt12 5% 130.00 21 wt12 1% 123.00 22 wt12 0% Min 123.00
Display of Output Dataset
Would like to put side-by-side
DATA wtbl wt12 ; SET qwt; if varname = 'wtbl' then output wtbl; else if varname = 'wt12' then output wt12; RUN;
PROC DATASETS ; MODIFY wtbl; RENAME estimate = wtbl; MODIFY wt12; RENAME estimate = wt12; RUN;
DATA all; MERGE wtbl wt12; DROP varname;RUN;PROC PRINT;
PROC DATASETS used for changing variable names
Separate the data into 2 datasets
Put 2 datasets side-by-side
Obs Quantile wtbl wt12
1 100% Max 279.30 271.50
2 99% 274.15 271.50
3 95% 246.40 239.00
4 90% 237.40 227.00
5 75% Q3 215.15 202.50
6 50% Median 192.65 180.00
7 25% Q1 165.90 153.50
8 10% 141.50 133.00
9 5% 137.40 130.00
10 1% 130.25 123.00
11 0% Min 128.50 123.00
ODS OUTPUT ParameterEstimates (persist=proc) = betas;
PROC REG DATA=WT; MODEL dbpchg = wtchg age sex;RUN;
PROC REG data=wt; MODEL sbpchg = wtchg age sex;RUN;ODS OUTPUT CLOSE;
PROC PRINT DATA=betas;RUN;
Obs Dependent Variable Estimate StdErr tValue Probt
1 dbpchg Intercept -0.059 6.431 -0.01 0.99 2 dbpchg wtchg 0.175 0.073 2.38 0.02 3 dbpchg age -0.101 0.112 -0.91 0.37 4 dbpchg sex -2.622 1.572 -1.67 0.10
5 sbpchg Intercept -3.849 13.304 -0.29 0.77 6 sbpchg wtchg 0.364 0.152 2.40 0.02 7 sbpchg age -0.042 0.231 -0.18 0.86 8 sbpchg sex -4.118 3.253 -1.27 0.21
Display of Output Dataset - Report
PROC PRINT; VAR variable estimate stderr tvalue probt; BY dependent NOTSORTED; FORMAT estimate 7.3 stderr 7.3 probt pvalue5.2 ;
Dependent=dbpchg
Obs Variable Estimate StdErr tValue Probt
1 Intercept -0.059 6.431 -0.01 0.99 2 wtchg 0.175 0.073 2.38 0.02 3 age -0.101 0.112 -0.91 0.37 4 sex -2.622 1.572 -1.67 0.10
Dependent=sbpchg
Obs Variable Estimate StdErr tValue Probt
5 Intercept -3.849 13.304 -0.29 0.77 6 wtchg 0.364 0.152 2.40 0.02 7 age -0.042 0.231 -0.18 0.86 8 sex -4.118 3.253 -1.27 0.21
Display of Output Dataset Using BY Statement
PROC RANK
• Used to divide observations into equal size categories based on values of a variable
• Creates a new variable containing the categories• New variable is added to the dataset or to a new
dataset
• Example: Divide weight change into 5 equal categories (Quinitiles)
PROC RANK SYNTAX
PROC RANK DATA = dataset OUT = outdataset GROUPS = # of categories VAR varname; RANKS newvarname;
Most of the time you can set OUT to be the same dataset specified in DATA.
PROC RANK writes no output
PROGRAM 15
LIBNAME class ‘C:\SAS_Files';DATA wtchol; SET class.tomhsp (KEEP=ptid clinic sex wtbl
wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl;RUN;*This PROC will add a new variable to dataset which is the tertile of weight change. The new variable will be 0,1,or 2;
PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; VAR wtchg; RANKS twtchg;
Name of new variable
PARTIAL LOG
8 DATA wtchol;
9 SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12);
10 wtchg = wt12 - wtbl;
11 cholchg = chol12 - cholbl;
12 RUN;
NOTE: There were 100 observations read from the data set CLASS.TOMHSP.
NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables.
PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol;
20 VAR wtchg; RANKS twtchg;
21 RUN;
NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables.
PROC FREQ DATA = wtchol; TABLES twtchg; RUN;
OUTPUT:
Rank for Variable wtchg
Cumulative Cumulativetwtchg Frequency Percent Frequency Percentƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 31 33.70 31 33.70 1 30 32.61 61 66.30 2 31 33.70 92 100.00
Frequency Missing = 8
PROC PRINT DATA = wtchol (obs=20); VAR ptid wtchg twtchg; TITLE 'Partial Listing of Datset wtchol with new variable added';RUN;
Partial Listing of Datset wtchol with new variable added
Obs PTID wtchg twtchg
1 A00083 -12.00 1 2 A00301 . . 3 A00312 -9.50 1 4 A00354 -21.00 0 5 A00400 . . 6 A00504 -9.25 1 7 A00608 . . 8 A00720 -18.50 0 9 A00762 -5.25 2 10 A00811 -6.75 1
PROC MEANS N MEAN MIN MAX MAXDEC=2; VAR cholchg wtchg; CLASS twtchg; TITLE 'Mean Cholesterol Change by Tertile of Weight Change';RUN;
Mean Cholesterol Change by Tertile of Weight Change
The MEANS Procedure
Rank for Variable N wtchg Obs Variable N Mean Minimum Maximum-------------------------------------------------------------------------- 0 31 cholchg 30 -13.43 -55.00 47.00 wtchg 31 -22.51 -36.50 -14.30
1 30 cholchg 30 -4.70 -37.00 26.00 wtchg 30 -10.21 -14.00 -6.80
2 31 cholchg 31 -0.74 -52.00 45.00 wtchg 31 -1.82 -6.50 13.00--------------------------------------------------------------------------
Could graph this data in an x-y plot (3 points)
Cutpoints for tertiles