day 4 [03 sept€¦ · web view2012/03/04 · genmod – generalized linear models logistic –...

Week 3/4 [06+ Sept.] Class Activities

File: week-03-04-10sep07.docDirectory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts

Week 3 Topic -- REPORT WRITING* Introduce the Output Delivery System (ODS) for customizing procedure output* PROC TABULATE for producing nicely-formatted tables

Week 4 Topic – INTRODUCTION TO MODELING PROCS* REG and GLM primarily

Bonus Material – conversational UNIX

ODS References

Gupta, S. (2003) Quick Results with the Output Delivery System. SAS Institute Inc., Cary, NC USA.

Delwiche LD and Slaughter SJ. (2003) The Little SAS Book: A Primer, 3rd edition. SAS Institute. Cary, NC, USA. [pages 144-157]

Haworth LE (2001) Output Delivery System: The Basics. SAS Institute Inc. Cary, NC USA.

ODS Basics

What is ODS?

* method of delivering output in a variety of formats (other than the default “listing” format”)

* options available include HTML, Rich Text Format (RTF), PS, PDF, SAS data sets

Basic ODS Terminology

“destinations” – locations to which ODS routes output (e.g. LISTING, HTML, RTF, PRINTER, PDF, OUTPUT – new data set)

“objects” – output entities created by ODS to store the formatted results

“styles” – font/color/other attributes of a report

Basic syntax of ODS statements

* identify output objects;ODS TRACE ON </options>;

* open output destination;ODS destination <FILE=filename>;

* create SAS data set with output object;ODS OUTPUT output-object-name=SAS-data-set-name;

* [optional] select particular objects for inclusion;ODS <destination> SELECT output-object-name;

PROC … PROC …

PROC …

ODS <destination> CLOSE;

ODS TRACE OFF;

ODS to different file types

A familiar example

proc format; value totfmt 0='none' 1-HIGH='some' ;

data d1;infile "\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ch2-dat.txt" firstobs=16 expandtabs missover pad ; * infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;input @9 animal 2. @17 conc 3. @25 brood1 2. @33 brood2 2. @41 brood3 2. @49 total 2.;

cbrood3 = brood3;format cbrood3 totfmt.;

label animal = animal ID number;label conc = Nitrofen concentration;label brood1 = number of young in first brood;label brood2 = number of young in 2nd brood;label brood3 = number of young in 3rd brood;label total = total young produced in three broods;

proc print;

where conc=0; run;

/* aside: ODS LISTING open as a default. You can have multiple destinations open simultaneously. If you want to close the LISTING destination before generating output then type ODS LISTING CLOSE; before issuing the PROC for which output is desired.*/

/* generate HTML files with objects from 3 PROCs */

ODS TRACE ON;

* ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example.html’;ODS HTML file="\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ODS-HTML-example.html”;

proc plot;

plot total*conc=cbrood3 / vaxis=0 to 40 by 2;

run;

proc freq;

table conc*cbrood3 / nopct nocol chisq trend exact;

run;

proc univariate plot; by conc;

var total;

run;

ODS HTML CLOSE;

ODS TRACE OFF;

/* now generate HTML files with additional linkage info */

ODS TRACE ON;

ODS HTML path='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs’ body = ’day6-example2.html’ /* Output objects */ contents = ‘day6-example2-TOC.html’ /* Table of contents */ frame = ‘day6-example2-frame.html’ /* organizes display */ newfile = NONE; /* all results to one file*/

/* old code where M drive referenced vs. specification of the full pathODS HTML path='M:\public.www\classes\sta402\SAS-programs’ body = ’day6-example2.html’ /* Output objects */ contents = ‘day6-example2-TOC.html’ /* Table of contents */ frame = ‘day6-example2-frame.html’ /* organizes display */ newfile = NONE; /* all results to one file*/*//* comment: by default, opens a new body file for each part of output so the“newfile=NONE” directs all output to the same body file

newfile=PAGE – creates new body file for each page of output

*/

proc plot;


run;

proc freq;

table conc*cbrood3 / nopct nocol chisq trend exact;

run;


var total;

run;

ODS HTML CLOSE;

ODS TRACE OFF;

/* select on one of the output objects for inclusion */

*ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example3.html’;

ODS HTML file=”\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example3.html”;

ODS HTML SELECT SSPLOTS;

ODS HTML SHOW; /* write details to SASLOG confirming object sel. */


var total;

run;

ODS HTML CLOSE;

/* select different destinations */

options orientation=landscape nocenter nodate;

ODS ESCAPECHAR= “^”; /* for fancy formatting later */

/* old program with M drive reference ODS RTF file='M:\public.www\classes\sta402\SAS-programs\day6-example.rtf’;ODS PDF file='M:\public.www\classes\sta402\SAS-programs\day6-example.pdf’;ODS PS file='M:\public.www\classes\sta402\SAS-programs\day6-example.ps’;*/

ODS RTF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.rtf’;ODS PDF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.pdf’;ODS PS file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.ps’;

Title ‘Plot of number of young vs. Nitrofen concentration^{super a}’;Footnote1 ‘^{super a}s=some young produced in Brood 3, n= no young produced in Brood 3’;

proc plot;


run;

ODS RTF CLOSE;ODS PDF CLOSE;ODS PS CLOSE;

ODS to create output data sets

proc sort data=d1; by conc;

ODS TRACE ON; /* see what ODS objects are created by univariate */proc univariate data=d1; by conc; var total; run;ODS TRACE OFF;

ODS OUTPUT Quantiles=data_quant; /* extract quantiles */proc univariate data=d1; by conc; var total; run;ODS OUTPUT CLOSE;proc print data=data_quant; run; Var Obs conc Name Quantile Estimate

1 0 total 100% Max 36.0 2 0 total 99% 36.0 3 0 total 95% 36.0 4 0 total 90% 35.0 5 0 total 75% Q3 34.0 6 0 total 50% Median 32.5 7 0 total 25% Q1 30.0 8 0 total 10% 25.5 9 0 total 5% 24.0 10 0 total 1% 24.0 11 0 total 0% Min 24.0. . . . . . . . . . . . . edited output . . . . . . . . . . . . . . . . 45 310 total 100% Max 15.0 46 310 total 99% 15.0 47 310 total 95% 15.0 48 310 total 90% 11.0 49 310 total 75% Q3 6.0 50 310 total 50% Median 6.0 51 310 total 25% Q1 5.0 52 310 total 10% 2.0

53 310 total 5% 0.0 54 310 total 1% 0.0 55 310 total 0% Min 0.0

/* can also create multiple data sets */ODS OUTPUT Quantiles(MATCH_ALL=conc_name_macro)=data_quant;proc univariate data=d1; by conc; var total; run;ODS OUTPUT CLOSE;proc print data=data_quant; run;

from the SAS LOG file NOTE: The data set WORK.DATA_QUANT has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=0NOTE: The data set WORK.DATA_QUANT1 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=80NOTE: The data set WORK.DATA_QUANT2 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=160NOTE: The data set WORK.DATA_QUANT3 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=235NOTE: The data set WORK.DATA_QUANT4 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=310

/* write the data set names to the SAS LOG */%put The conc_name_macro variables contains the following data sets &conc_name_macro;

76 %put The conc_name_macro variables contains the following data sets &conc_name_macro;

The conc_name_macro variables contains the following data sets DATA_QUANT DATA_QUANT1 DATA_QUANT2DATA_QUANT3 DATA_QUANT4&conc_name_macro;

/* merge the concentration summary files to create single table */data c0; set DATA_QUANT; rename Estimate=C0_Est; key=_n_; drop VarName conc;data c80; set DATA_QUANT1; rename Estimate=C80_Est; key=_n_; drop VarName conc;data c160; set DATA_QUANT2; rename Estimate=C160_Est; key=_n_; drop VarName conc;data c235; set DATA_QUANT3; rename Estimate=C235_Est; key=_n_; drop VarName conc;data c310; set DATA_QUANT4; rename Estimate=C310_Est; key=_n_; drop VarName conc;

data all; merge c0 c80 c160 c235 c310; by key; drop key;

proc print data=all; run; Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est

1 100% Max 36.0 36.0 31.0 27.0 15 2 99% 36.0 36.0 31.0 27.0 15 3 95% 36.0 36.0 31.0 27.0 15

4 90% 35.0 35.5 30.5 25.0 11 5 75% Q3 34.0 33.0 30.0 21.0 6 6 50% Median 32.5 32.5 29.0 16.5 6 7 25% Q1 30.0 29.0 27.0 13.0 5 8 10% 25.5 26.5 24.5 9.5 2 9 5% 24.0 26.0 23.0 7.0 0 10 1% 24.0 26.0 23.0 7.0 0 11 0% Min 24.0 26.0 23.0 7.0 0

/* extract the rows-observations corresponding to the 5 number summary */data fivenum; set all; if _n_=1 or _n_=5 or _n_=6 or _n_=7 or _n_=11;proc print; run; Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est

1 100% Max 36.0 36.0 31.0 27.0 15 2 75% Q3 34.0 33.0 30.0 21.0 6 3 50% Median 32.5 32.5 29.0 16.5 6 4 25% Q1 30.0 29.0 27.0 13.0 5 5 0% Min 24.0 26.0 23.0 7.0 0

Using ODS OUTPUT to create dataset in a simulation

/* Extracting coefficients from simple linear regression simulation*/

options formdlim="-" nodate;

/* generate simulation data sets Y ~ N(mu(x)= 3+2x, sigma=2) */

data sims; do dataset=1 to 1000; do x=1 to 10;

y = 3 + 2*x + 2*rannor(0); output;end;

end;

/* DEBUG: print to check generated data */proc print data=sims; run;

/* SORT for data set */proc sort data=sims; by dataset; run;/* USE OUTEST to extract the estimated coefficients */proc reg data=sims outest=myparms; by dataset;

model y=x; run;

proc print data=myparms; run;

/* HISTOGRAM for estimated slope */proc gchart data=work.myparms; vbar x; run;

/* Re-do this with ODS */*ods trace on; * determine what output objects are constructed;

ods output ParameterEstimates=reg_coefs;

proc reg data=sims; by dataset; model y=x; run;

proc print data=reg_coefs; run;

ods output close;

*ods trace off;

proc print data=reg_coefs; run;

proc contents data=reg_coefs; run;

data slopes; set reg_coefs; if Variable="x"; slope=Estimate; keep dataset slope;

data intercepts; set reg_coefs; if Variable="Intercept"; Intercept = Estimate; keep dataset intercept;

data both; merge slopes intercepts; by dataset;proc gplot data=both;title "Plot of estimated slope vs. estimated intercept"; plot slope*intercept;

run;

proc gchart data=both;title "Sampling distribution of the estimated slope"; vbar slope; run;

proc gchart data=bothtitle "Sampling distribution of the estimated intercept"; vbar intercept; run;

proc print data=slopes; run;

PROC TABULATE (producing fancier results tables in SAS)

PROC TABULATE <option(s)>; CLASS variable(s) </ options>; * identify non-numeric vars; FREQ variable; * identify variable containing frequency of observation; TABLE <<page-expression,> row-expression,> column-expression</ table-option(s)>; VAR analysis-variable(s)</ options>; * identify analysis vars; WEIGHT variable; * identify variable name – e.g. sampling wts;

* FORMATTING related subcommands …CLASSLEV variable(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >; KEYLABEL keyword-1='description-1' <...keyword-n='description-n'>; KEYWORD keyword(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >; [* check out results of search for “Tabulate syntax” on www.muohio.edu/quantapps SAS doc]

Comments:

http://www.muohio.edu/quantapps

* concatenation (blank) operator* crossing (*) operator* format modifiers* grouping elements (parentheses) operator* ALL class variable

data d1; infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;input @9 animal 2. @17 conc 3. @25 brood1 2. @33 brood2 2. @41 brood3 2. @49 total 2.;

proc tabulate data=d1; class conc; var brood1 brood2 brood3 total; table (brood1 brood2 brood3 total)*conc, min q1 median q3 max; run;

proc tabulate data=d1; class conc; var total; table conc=”Nitrofen Concentration” all, total (mean var); run;

Week 04+/- [12+ Sept.] Class Activities

AN INTRODUCTION TO STATISTICAL MODELING

* PROC REG for linear modeling (a very basic introduction)* PROC GLM for anova models

Other normal response modelingANOVA – balanced anova models

Non-normal response modelingGENMOD – generalized linear modelsLOGISTIC – [grouped] binary regressionPROBIT – [grouped] binary regression (INVERSECL)CATMOD – categorical data modeling

Failure time modelingLIFEREG – accelerated failure time modelsPHREG – Cox’s PH model

And more …

REGRESSION using PROC REG

Basic Model: Yi = 0 +1Xi + i [“simple linear regression”]

= 0 +1 Xi1 +2 Xi2 +3Xi3 +4 Xi4 +5Xi5 + ij [“multiple linear regression”]

Error Assumption:ij ~ indep. N(0, 2)

i=1,2,…,n [observations]

/* example sas program that does simple linear regression*/

options ls=75;

data example1; input year nboats manatees; cards;77 447 1378 460 2179 481 2480 498 1681 513 2482 512 2083 526 1584 559 3485 585 3386 614 3387 645 3988 675 4389 711 5090 719 47;/* WARNING: ODS RTF will place TITLE information along With SAS date/time/page number as part of a header in the RTP document. Check out Print Preview or view the header.*/ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\linreg-output.rtf’;

proc reg;title ‘Number of Manatees killed regressed on the number of boats registered in Florida’; model manatees = nboats / p r cli clm; plot manatees*nboats=”o” p.*nboats=”+” / overlay; plot r.*nboats r.*p.;run;

ODS RTF CLOSE;

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > F

Model 1 1711.97866 1711.97866 93.61 <.0001

Error 12 219.44991 18.28749

Corrected Total 13 1931.42857

Root MSE 4.27639 R-Square 0.8864

Dependent Mean 29.42857 Adj R-Sq 0.8769

Coeff Var 14.53141

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -41.43044 7.41222 -5.59 0.0001

nboats 1 0.12486 0.01290 9.68 <.0001

Output Statistics

ObsDep Var

manateesPredicted

ValueStd Error

Mean Predict 95% CL Mean 95% CL Predict ResidualStd ErrorResidual

StudentResidual

1 13.0000 14.3827 1.9299 10.1779 18.5876 4.1604 24.6050 -1.3827 3.816 -0.362

2 21.0000 16.0059 1.7974 12.0896 19.9222 5.8989 26.1130 4.9941 3.880 1.287

3 24.0000 18.6280 1.5976 15.1472 22.1089 8.6816 28.5745 5.3720 3.967 1.354

4 16.0000 20.7507 1.4528 17.5853 23.9161 10.9102 30.5911 -4.7507 4.022 -1.181

5 24.0000 22.6236 1.3420 19.6997 25.5475 12.8582 32.3891 1.3764 4.060 0.339

6 20.0000 22.4987 1.3488 19.5600 25.4375 12.7288 32.2687 -2.4987 4.058 -0.616

7 15.0000 24.2468 1.2622 21.4968 26.9968 14.5320 33.9616 -9.2468 4.086 -2.263

8 34.0000 28.3672 1.1482 25.8656 30.8689 18.7198 38.0147 5.6328 4.119 1.367

9 33.0000 31.6137 1.1650 29.0753 34.1520 21.9566 41.2707 1.3863 4.115 0.337

10 33.0000 35.2346 1.2909 32.4221 38.0472 25.5019 44.9673 -2.2346 4.077 -0.548

11 39.0000 39.1054 1.5187 35.7963 42.4144 29.2178 48.9929 -0.1054 3.998 -0.0264

12 43.0000 42.8512 1.7974 38.9349 46.7675 32.7442 52.9582 0.1488 3.880 0.0383

13 50.0000 47.3462 2.1762 42.6048 52.0877 36.8917 57.8007 2.6538 3.681 0.721

14 47.0000 48.3451 2.2647 43.4109 53.2794 37.8018 58.8884 -1.3451 3.628 -0.371

Output Statistics

Obs -2-1 0 1 2Cook's

D

1 | | |

0.017

2 | |** |

0.178

3 | |** |

0.149

4 | **| |

0.091

5 | | |

0.006

6 | *| |

0.021

7 | ****| |

0.244

8 | |** |

0.073

9 | | |

0.005

10 | *| |

0.015

Output Statistics

Obs -2-1 0 1 2Cook's

D

11 | | |

0.000

12 | | |

0.000

13 | |* |

0.091

14 | | |

0.027

Sum of Residuals 0

Sum of Squared Residuals 219.44991

Predicted Residual SS (PRESS) 281.76275

Multiple Regression with indicator variables

Log(Brain Weight)i = 0 +1 Log(Body Weight)i + i

Log(Brain Weight)i = 0 +1 Log(Body Weight)i +2 Idinoi + i

Log(Brain Weight)i = 0 +1 Log(Body Weight)i +2 Idinoi + 3 Idinoi ×Log(Body Weight)i +i

data mrexample;* Lunneborg (1994);* body weight brain example; input species $ bodywt brainwt @@; logbody = log10(bodywt); logbrain = log10(brainwt); idino = 0; if (species="diplodoc" or species="tricerat" or species="brachios") then idino=1; idinobod = idino*logbody; cards;beaver 1.35 8.10 cow 465.00 423.00 wolf 36.33 119.50 goat 27.66 115.00guipig 1.04 5.50 diplodocus 11700.00 50.00 asielephant 2547.00 4603.00donkey 187.10 419.00 horse 521.00 655.00 potarmonkey 10.00 115.00cat 3.30 25.60 giraffe 529.000 680.00 gorilla 207.00 406.00human 62.00 1320.00 afrelephant 6654.00 5712.00 triceratops 9400.00 70.00rhemonkey 6.80 179.00 kangaroo 35.00 56.00 hamster 0.12 1.00mouse 0.023 0.40 rabbit 2.50 12.10 sheep 55.50 175.00 jaguar 100.00 157.00 chimp 52.16 440.00 brachiosaurus 87000.00 154.50rat 0.28 1.90 mole 0.122 3.00 pig 192.00 180;

ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\mreg-output.rtf’;

proc print;title ‘brain wt - body wt data’;run;

proc univariate; var bodywt brainwt;

id species; run;

proc reg;title2 ‘allometric scaling - brain and body wt.’;

title3 ‘[All Species combined]’; model logbrain=logbody; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;

proc reg;title2 ‘Dinosaurs fitted with potentially different line’; model logbrain=logbody idino idinobod; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;

proc reg;title2 ‘Dinosaurs fitted with potentially different INTERCEPTS’; model logbrain=logbody idino; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;

ODS RTF CLOSE;

Obs species bodywt brainwt logbody logbrain idino idinobod

1 beaver 1.35 8.1 0.13033 0.90849 0 0.00000

2 cow 465.00 423.0 2.66745 2.62634 0 0.00000

3 wolf 36.33 119.5 1.56027 2.07737 0 0.00000

4 goat 27.66 115.0 1.44185 2.06070 0 0.00000

5 guipig 1.04 5.5 0.01703 0.74036 0 0.00000

6 diplodoc 11700.00 50.0 4.06819 1.69897 1 4.06819

7 asieleph 2547.00 4603.0 3.40603 3.66304 0 0.00000

8 donkey 187.10 419.0 2.27207 2.62221 0 0.00000

9 horse 521.00 655.0 2.71684 2.81624 0 0.00000

10 potarmon 10.00 115.0 1.00000 2.06070 0 0.00000

11 cat 3.30 25.6 0.51851 1.40824 0 0.00000

Obs species bodywt brainwt logbody logbrain idino idinobod

12 giraffe 529.00 680.0 2.72346 2.83251 0 0.00000

13 gorilla 207.00 406.0 2.31597 2.60853 0 0.00000

14 human 62.00 1320.0 1.79239 3.12057 0 0.00000

15 afreleph 6654.00 5712.0 3.82308 3.75679 0 0.00000

16 tricerat 9400.00 70.0 3.97313 1.84510 1 3.97313

17 rhemonke 6.80 179.0 0.83251 2.25285 0 0.00000

18 kangaroo 35.00 56.0 1.54407 1.74819 0 0.00000

19 hamster 0.12 1.0 -0.92082 0.00000 0 0.00000

20 mouse 0.02 0.4 -1.63827 -0.39794 0 0.00000

21 rabbit 2.50 12.1 0.39794 1.08279 0 0.00000

22 sheep 55.50 175.0 1.74429 2.24304 0 0.00000

23 jaguar 100.00 157.0 2.00000 2.19590 0 0.00000

24 chimp 52.16 440.0 1.71734 2.64345 0 0.00000

25 brachios 87000.00 154.5 4.93952 2.18893 1 4.93952

26 rat 0.28 1.9 -0.55284 0.27875 0 0.00000

27 mole 0.12 3.0 -0.91364 0.47712 0 0.00000

28 pig 192.00 180.0 2.28330 2.25527 0 0.00000

brain wt - body wt data

The UNIVARIATE ProcedureVariable: bodywt

BODY WEIGHTMoments

N 28 Sum Weights 28

Mean 4278.43875 Sum Observations 119796.285

Std Deviation 16480.4904 Variance 271606563

Skewness 5.03388585 Kurtosis 26.0100719

Uncorrected SS 7845918273 Corrected SS 7333377205

Coeff Variation 385.198698 Std Error Mean 3114.51993

Basic Statistical Measures

Location Variability

Mean 4278.439 Std Deviation 16480

Median 53.830 Variance 271606563

Mode . Range 87000

Interquartile Range 490.10000

Tests for Location: Mu0=0

Test Statistic p Value

Student's t t 1.373707 Pr > |t| 0.1808

Sign M 14 Pr >= |M| <.0001

Signed Rank S 203 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 87000.000

99% 87000.000

95% 11700.000

90% 9400.000

75% Q3 493.000

50% Median 53.830

25% Q1 2.900

10% 0.122

5% 0.120


The UNIVARIATE ProcedureVariable: bodywt


Quantile Estimate

1% 0.023

0% Min 0.023

Extreme Observations

Lowest Highest

Value species Obs Value species Obs

0.023 mouse 20 2547 asieleph 7

0.120 hamster 19 6654 afreleph 15

0.122 mole 27 9400 tricerat 16

0.280 rat 26 11700 diplodoc 6

1.040 guipig 5 87000 brachios 25


The UNIVARIATE ProcedureVariable: brainwt

BRAIN WEIGHTMoments

N 28 Sum Weights 28

Mean 574.521429 Sum Observations 16086.6

Std Deviation 1334.92919 Variance 1782035.94

Skewness 3.33453913 Kurtosis 10.6457044

Uncorrected SS 57357066.9 Corrected SS 48114970.5

Coeff Variation 232.354987 Std Error Mean 252.277904



Mean 574.5214 Std Deviation 1335

Median 137.0000 Variance 1782036

Mode 115.0000 Range 5712




Student's t t 2.277336 Pr > |t| 0.0309

Sign M 14 Pr >= |M| <.0001

Signed Rank S 203 Pr >= |S| <.0001


Quantile Estimate

100% Max 5712.00

99% 5712.00

95% 4603.00

90% 1320.00

75% Q3 421.00

50% Median 137.00

25% Q1 18.85

10% 1.90

5% 1.00


The UNIVARIATE ProcedureVariable: brainwt


Quantile Estimate

1% 0.40

0% Min 0.40


Lowest Highest

Value species Obs Value species Obs

0.4 mouse 20 655 horse 9

1.0 hamster 19 680 giraffe 12

1.9 rat 26 1320 human 14

3.0 mole 27 4603 asieleph 7

5.5 guipig 5 5712 afreleph 15


Source DFSum of

SquaresMean


Model 1 17.81230 17.81230 40.26 <.0001

Error 26 11.50305 0.44242




Coeff Var 34.60816

Parameter Estimates


EstimateStandard


Intercept 1 1.10958 0.17942 6.18 <.0001

logbody 1 0.49599 0.07817 6.35 <.0001


Source DFSum of

SquaresMean


Model 3 27.01211 9.00404 93.82 <.0001

Error 24 2.30324 0.09597




Coeff Var 16.11844

Parameter Estimates


EstimateStandard


Intercept 1 0.93391 0.08562 10.91 <.0001

logbody 1 0.75226 0.04493 16.74 <.0001

idino 1 -0.91748 1.79054 -0.51 0.6131

idinobod 1 -0.31441 0.41371 -0.76 0.4547


Source DFSum of

SquaresMean


Model 2 26.95668 13.47834 142.86 <.0001

Error 25 2.35867 0.09435




Coeff Var 15.98167

Parameter Estimates


EstimateStandard


Intercept 1 0.93879 0.08465 11.09 <.0001

logbody 1 0.74855 0.04428 16.90 <.0001

idino 1 -2.26674 0.23024 -9.84 <.0001

One-way ANOVA

Basic Model: Yij = i + ij [“cell means” coding]= + i + ij [“effects” coding]

(constraint for estimation? 1=0 or g=0 or i=0)

Error Assumption:ij ~ indep. N(0, 2)

i=1,2,…,g [treatments or populations]j=1,2,…,ni [replications]

H0: 1 = … = g or equivalently, H0: 1 = … = g=0

/* Bacteria in meat under 4 different conditions */

options ls = 75;

data meat;

input condition $ logcount @@; datalines;Plastic 7.66 Plastic 6.98 Plastic 7.80Vacuum 5.26 Vacuum 5.44 Vacuum 5.80Mixed 7.41 Mixed 7.33 Mixed 7.04Co2 3.51 Co2 2.91 Co2 3.66;

title bacteria growth under 4 packaging conditions;

ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\oneway-output.rtf’;

proc boxplot; plot logcount*condition;run;

proc glm data=meat order=data;title2 fitting the one-way anova model via GLM; class condition; model logcount = condition; means condition / bon tukey scheffe cldiff lines; lsmeans condition / cl pdiff; contrast ‘plastic vs. rest’ condition 3 –1 –1 –1; output out=new p=yhat r=resid stdr=eresid;run;

proc plot data=new;title2 residual analyses; plot resid*yhat;run;

proc univariate data=new plot; var resid;run;

proc boxplot data=new; plot resid*condition;run;

ODS RTF CLOSE;

bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM

The GLM Procedure

Class Level Information

Class Levels Values

condition 4 Plastic Vacuum Mixed Co2

Number of observations 12


The GLM Procedure

Dependent Variable: logcount

Source DFSum of

Squares Mean Square F Value Pr > F

Model 3 32.87280000 10.95760000 94.58 <.0001

Error 8 0.92680000 0.11585000


R-Square Coeff Var Root MSE logcount Mean0.972580 5.768940 0.340367 5.900000

Source DF Type I SS Mean Square F Value Pr > F

condition 3 32.87280000 10.95760000 94.58 <.0001

Source DF Type III SS Mean Square F Value Pr > F

condition 3 32.87280000 10.95760000 94.58 <.0001


The GLM Procedure

Tukey's Studentized Range (HSD) Test for logcount

NOTE: This test controls the Type I experimentwise error rate.

Alpha 0.05

Error Degrees of Freedom 8

Error Mean Square 0.11585

Critical Value of Studentized Range 4.52880

Minimum Significant Difference 0.89

Comparisons significant at the 0.05 level are indicated by ***.

conditionComparison

DifferenceBetween

Means

Simultaneous 95%

Confidence Limits

Plastic - Mixed 0.2200 -0.6700 1.1100

Plastic - Vacuum 1.9800 1.0900 2.8700 ***

Plastic - Co2 4.1200 3.2300 5.0100 ***

Mixed - Plastic -0.2200 -1.1100 0.6700

Mixed - Vacuum 1.7600 0.8700 2.6500 ***

Mixed - Co2 3.9000 3.0100 4.7900 ***

Vacuum - Plastic -1.9800 -2.8700 -1.0900 ***

Vacuum - Mixed -1.7600 -2.6500 -0.8700 ***

Vacuum - Co2 2.1400 1.2500 3.0300 ***

Co2 - Plastic -4.1200 -5.0100 -3.2300 ***

Co2 - Mixed -3.9000 -4.7900 -3.0100 ***

Co2 - Vacuum -2.1400 -3.0300 -1.2500 ***


The GLM Procedure

Bonferroni (Dunn) t Tests for logcount

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons.

Alpha 0.05



Critical Value of t 3.47888



conditionComparison

DifferenceBetween

Means

Simultaneous 95%

Confidence Limits

Plastic - Mixed 0.2200 -0.7468 1.1868

Plastic - Vacuum 1.9800 1.0132 2.9468 ***

Plastic - Co2 4.1200 3.1532 5.0868 ***

Mixed - Plastic -0.2200 -1.1868 0.7468

Mixed - Vacuum 1.7600 0.7932 2.7268 ***

Mixed - Co2 3.9000 2.9332 4.8668 ***

Vacuum - Plastic -1.9800 -2.9468 -1.0132 ***

Vacuum - Mixed -1.7600 -2.7268 -0.7932 ***

Vacuum - Co2 2.1400 1.1732 3.1068 ***

Co2 - Plastic -4.1200 -5.0868 -3.1532 ***

Co2 - Mixed -3.9000 -4.8668 -2.9332 ***

Co2 - Vacuum -2.1400 -3.1068 -1.1732 ***


The GLM Procedure

Scheffe's Test for logcount

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons.

Alpha 0.05



Critical Value of F 4.06618



conditionComparison

DifferenceBetween

Means

Simultaneous 95%

Confidence Limits

Plastic - Mixed 0.2200 -0.7506 1.1906

Plastic - Vacuum 1.9800 1.0094 2.9506 ***

Plastic - Co2 4.1200 3.1494 5.0906 ***

Mixed - Plastic -0.2200 -1.1906 0.7506

Mixed - Vacuum 1.7600 0.7894 2.7306 ***

Mixed - Co2 3.9000 2.9294 4.8706 ***

Vacuum - Plastic -1.9800 -2.9506 -1.0094 ***

Vacuum - Mixed -1.7600 -2.7306 -0.7894 ***

Vacuum - Co2 2.1400 1.1694 3.1106 ***

Co2 - Plastic -4.1200 -5.0906 -3.1494 ***

Co2 - Mixed -3.9000 -4.8706 -2.9294 ***

Co2 - Vacuum -2.1400 -3.1106 -1.1694 ***


The GLM Procedure

Tukey's Studentized Range (HSD) Test for logcount

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha 0.05





Means with the same letter are not significantly different.

Tukey Grouping Mean N condition

A 7.4800 3 Plastic

A

A 7.2600 3 Mixed

B 5.5000 3 Vacuum

C 3.3600 3 Co2


The GLM Procedure

Bonferroni (Dunn) t Tests for logcount


Alpha 0.05



Critical Value of t 3.47888



Bon Grouping Mean N condition

A 7.4800 3 Plastic

A

A 7.2600 3 Mixed

B 5.5000 3 Vacuum

C 3.3600 3 Co2


The GLM Procedure

Scheffe's Test for logcount

NOTE: This test controls the Type I experimentwise error rate.

Alpha 0.05



Critical Value of F 4.06618



Scheffe Grouping Mean N condition

A 7.4800 3 Plastic

A

A 7.2600 3 Mixed

B 5.5000 3 Vacuum

C 3.3600 3 Co2


The GLM ProcedureLeast Squares Means

conditionlogcount

LSMEANLSMEAN

Number

Plastic 7.48000000 1

Vacuum 5.50000000 2

Mixed 7.26000000 3

Co2 3.36000000 4

Least Squares Means for effect conditionPr > |t| for H0: LSMean(i)=LSMean(j)


i/j 1 2 3 4

1 <.0001 0.4514 <.0001

2 <.0001 0.0002 <.0001

3 0.4514 0.0002 <.0001

4 <.0001 <.0001 <.0001

conditionlogcount

LSMEAN95% Confidence

Limits

Plastic 7.480000 7.026844 7.933156

Vacuum 5.500000 5.046844 5.953156

Mixed 7.260000 6.806844 7.713156

Co2 3.360000 2.906844 3.813156

Least Squares Means for Effect condition

i j

Difference Between

Means95% Confidence Limits

for LSMean(i)-LSMean(j)

1 2 1.980000 1.339141 2.620859

1 3 0.220000 -0.420859 0.860859

1 4 4.120000 3.479141 4.760859

2 3 -1.760000 -2.400859 -1.119141

2 4 2.140000 1.499141 2.780859

3 4 3.900000 3.259141 4.540859



NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.


The GLM Procedure


Contrast DF Contrast SS Mean Square F Value Pr > F

plastic vs. rest 1 9.98560000 9.98560000 86.19 <.0001

bacteria growth under 4 packaging conditionsresidual analyses

Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs, etc. resid ‚ ‚ 0.4 ˆ ‚ ‚ ‚ A 0.3 ˆ A A ‚ ‚ ‚ 0.2 ˆ ‚ A ‚ A A ‚ 0.1 ˆ ‚ A ‚ ‚ 0.0 ˆ ‚ ‚ A ‚ -0.1 ˆ ‚ ‚ ‚ -0.2 ˆ ‚ A ‚ A ‚ -0.3 ˆ ‚ ‚ ‚ -0.4 ˆ ‚ ‚ A ‚ -0.5 ˆ A ‚ Šƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒ 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 yhat


The UNIVARIATE ProcedureVariable: resid

Moments

N 12 Sum Weights 12

Mean 0 Sum Observations 0

Std Deviation 0.29026634 Variance 0.08425455

Skewness -0.6294875 Kurtosis -0.971163

Uncorrected SS 0.9268 Corrected SS 0.9268

Coeff Variation . Std Error Mean 0.08379267



Mean 0.000000 Std Deviation 0.29027

Median 0.110000 Variance 0.08425

Mode 0.300000 Range 0.82000




Student's t t 0 Pr > |t| 1.0000

Sign M 1 Pr >= |M| 0.7744

Signed Rank S 2 Pr >= |S| 0.8931


Quantile Estimate

100% Max 0.32

99% 0.32

95% 0.32

90% 0.30

75% Q3 0.24

50% Median 0.11

25% Q1 -0.23

10% -0.45

5% -0.50


The UNIVARIATE ProcedureVariable: resid


Quantile Estimate

1% -0.50

0% Min -0.50


Lowest Highest

Value Obs Value Obs

-0.50 2 0.15 7

-0.45 11 0.18 1

-0.24 4 0.30 6

-0.22 9 0.30 12

-0.06 5 0.32 3

Stem Leaf # Boxplot 3 002 3 | 2 +-----+ 1 558 3 *-----* 0 7 1 | + | -0 6 1 | | -1 | | -2 42 2 +-----+ -3 | -4 5 1 | -5 0 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Normal Probability Plot 0.35+ * *++ * | ++++ | * *+*+ 0.05+ *++++ | *++ | +++ -0.25+ ++*+ * | +++ | +*++ * -0.55+ +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data

Factorial anova model

Basic Model: Yijk = ij + ijk [“cell means” coding]= + i + j + (ij + ijk [“effects” coding]

Error Assumption:ijk ~ indep. N(0, 2)

i=1,2,…,g [treatments or populations]j=1,2,…,ni [replications]

H0: allij =0 [no interaction]

H0: alli =0 [no A main effect]H0: allj =0 [no B main effect]

ODS ESCAPECHAR= “^”; /* for fancy formatting later */

ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\twoway-output.rtf’;

title Two-way ANOVA/ Factorial example 2 - interaction plots;title2 Patient Waiting Time data;data cwait; input doctype $ practype $ time @@; cards; gen group 15 gen group 20 gen group 25 gen group 20 gen solo 20 gen solo 25 gen solo 30 gen solo 25spec group 30 spec group 25 spec group 30 spec group 35spec solo 25 spec solo 20 spec solo 30 spec solo 30;proc print;

proc sort; by doctype practype;proc means noprint; by doctype practype; output out=factmean mean=timemean;proc plot data=factmean; plot timemean*doctype=practype / vaxis=0 to 35 by 5;

proc glm data=cwait order=data; class doctype practype; model time=doctype|practype;/* equivalent model statement model time=doctype practype doctype*practype;


*/ output out=new p=yhat r=resid; lsmeans doctype practype doctype*practype / stderr pdiff; means doctype practype doctype*practype / tukey; run; ODS RTF CLOSE;

Obs doctype practype time

1 gen group 152 gen group 203 gen group 254 gen group 205 gen solo 206 gen solo 257 gen solo 308 gen solo 259 spec group 30

10 spec group 2511 spec group 3012 spec group 3513 spec solo 2514 spec solo 2015 spec solo 3016 spec solo 30


Plot of timemean*doctype. Symbol is value of practype. timemean ‚ 35 ˆ ‚ ‚ ‚ ‚ ‚ 30 ˆ g ‚ ‚ ‚ ‚ s ‚ 25 ˆ s ‚ ‚ ‚ ‚ ‚ 20 ˆ g ‚ ‚ ‚ ‚ ‚ 15 ˆ ‚ ‚ ‚ ‚ ‚ 10 ˆ ‚ ‚ ‚ ‚ ‚ 5 ˆ ‚ ‚ ‚ ‚ ‚ 0 ˆ ‚ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ gen spec doctype


The GLM Procedure

Class Level Information

Class Levels Values

doctype 2 gen specpractype 2 group

solo

Number of observations 16


The GLM Procedure

Dependent Variable: time

Source DFSum of

Squares Mean Square F Value Pr > FModel 3 204.6875000 68.2291667 3.74 0.0415

Error 12 218.7500000 18.2291667


R-Square Coeff Var Root MSE time Mean

0.483395 16.86741 4.269563 25.31250

Source DF Type I SS Mean Square F Value Pr > Fdoctype 1 126.5625000 126.5625000 6.94 0.0218

practype 1 1.5625000 1.5625000 0.09 0.7747

doctype*practype 1 76.5625000 76.5625000 4.20 0.0629

Source DF Type III SS Mean Square F Value Pr > Fdoctype 1 126.5625000 126.5625000 6.94 0.0218

practype 1 1.5625000 1.5625000 0.09 0.7747

doctype*practype 1 76.5625000 76.5625000 4.20 0.0629



doctype time LSMEANStandard

Error

H0:LSMEAN=0 H0:LSMean1=LSMean2

Pr > |t| Pr > |t|Gen 22.5000000 1.5095184 <.0001 0.0218

Spec 28.1250000 1.5095184 <.0001

practype time LSMEANStandard

Error

H0:LSMEAN=0 H0:LSMean1=LSMean2

Pr > |t| Pr > |t|Group 25.0000000 1.5095184 <.0001 0.7747

solo 25.6250000 1.5095184 <.0001

doctype practype time LSMEANStandard

Error Pr > |t|LSMEAN

Number

gen group 20.0000000 2.1347814 <.0001 1

gen solo 25.0000000 2.1347814 <.0001 2

spec group 30.0000000 2.1347814 <.0001 3

spec solo 26.2500000 2.1347814 <.0001 4

Least Squares Means for effect doctype*practypePr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: time

i/j 1 2 3 4

1 0.1236 0.0062 0.06072 0.1236 0.1236 0.68613 0.0062 0.1236 0.23794 0.0607 0.6861 0.2379

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.


The GLM Procedure

Tukey's Studentized Range (HSD) Test for time


Alpha 0.05






Tukey Grouping Mean N doctype

A 28.125 8 spec

B 22.500 8 gen


The GLM Procedure



Alpha 0.05






Tukey Grouping Mean N practype

A 25.625 8 solo

A

A 25.000 8 group


The GLM Procedure


Level ofdoctype

Level ofpractype N

time

Mean Std Dev

gen group 4 20.0000000 4.08248290

gen solo 4 25.0000000 4.08248290

spec group 4 30.0000000 4.08248290

spec solo 4 26.2500000 4.78713554


The GLM Procedure


Bonus Material – if you end up using SAS on the Cluster

Unix day!

Create a file containing the following SAS code in your favorite word processor.

/* collin-sim.sas; first version: collin-s96.sas constructed to illustrate collinearity in regression class*/options ls=74;title analyses illustrating co-linearity in multiple regression;data d1; input x y @@; x2 = x + ranuni(0); x3 = x - ranuni(0);cards;10.0 8.04 8.0 6.95 13.0 7.58 9.0 8.81 11.0 8.33 14.0 9.96 6.0 7.24 4.0 4.26 12.0 10.84 7.0 4.82 5.0 5.68;proc print; run;proc reg; model y = x; run;proc corr; var x x2 x3; run;proc reg data=d1; model y = x x2 x3 / r influence tol vif collinoint xpx i; output out=p1out r=resid p=pred; run; MacOS1. start "X11" application - look in ../applications/utilities2. ssh -X [email protected]. R ... sas ... em (enterprise miner)

(file transfer visa "fugu" in applications)

Windows OS1. "putty" to start terminal


The GLM Procedure


2. "Xmng" to display X graphics3. "WinSCP" to transfer files ...

1. Start up a "ssh" or "putty" [old days … TELNET or XTERM] client

and log in to the redhawk.hpc.muohio.edu cluster [need account]

2. Basic UNIX shell commands – WARNING: UNIX CaSe-SEnsITive

“passwd” - change your password

“man” – get “manuals” /help for a particular function (e.g. man ls) “ls” – list the files in your current working directory (e.g. ls –l)

“rm” – deletes a file

“mv” – copies a file (e.g. mv old.sas new.sas)

“cat” – concatenate and display file (e.g. “cat <filename>” prints file; “cat file1 file2 > file3” concatenates file1 and file2 and writes result to file3) [> directs output in this example]

“grep” – search a file for a pattern (get regular expression)

“pwd” – print name of the current working directory

“mkdir” – create a directory in the current working directory (e.g. mkdir sta402)

“rmdir” – remove a directory

“cd” – change directory (e.g. cd sta402 to move into the “sta402”; cd .. to move up one level in the directory tree)

“more <filename>” - display the contents of file “filename” (space bar = next page; b = previous page; q = quit more; /name = search for name)

“head <filename>” – display the first few lines of a file

“tail <filename>” – display the last few lines of a file


The GLM Procedure


“sas <filename.sas>” – causes SAS to execute commands in “filename.sas” in batch mode

“Splus” – starts S-Plus (provides “>” prompt; q() = quit S+)“exit” or “logout” to end session

* can link commands together (“piping”)

ls –l | morels –l | grep “drwx”

* deleting characters - <ctrl>-u deletes all lines to left of current position

* can recall earlier commands using arrow keys

* “history” gives a list of all commands issued – select a numbered command by “!<number>”

* can edit previous commands

ls –l | grep “Sp”

^ep^Sep

3. Log onto redhawk.hpc.muohio.edu (via a ssh/putty session)a. change your passwordb. create a directory for this class (mkdir sta402 or mkdir sta502 or …)

4. FTP onto unixgen.muohio.edu

a. cd sta402b. “put” the SAS file that you created earlierc. quit ftp

5. Run SAS on the commands in the file that you transferred.

a. Type “sas filename” (omit the .sas extension)b. This will create a log file (filename.log) and a listing file (filename.lst)c. Use more to check the filename.log and the filename.lst file.

day 4 [03 sept€¦ · web view2012/03/04 · genmod – generalized linear models logistic –...

Documents