day 4 [03 sept€¦ · web view2012/03/04 · genmod – generalized linear models logistic –...
TRANSCRIPT
Week 3/4 [06+ Sept.] Class Activities
File: week-03-04-10sep07.docDirectory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts
Week 3 Topic -- REPORT WRITING* Introduce the Output Delivery System (ODS) for customizing procedure output* PROC TABULATE for producing nicely-formatted tables
Week 4 Topic – INTRODUCTION TO MODELING PROCS* REG and GLM primarily
Bonus Material – conversational UNIX
ODS References
Gupta, S. (2003) Quick Results with the Output Delivery System. SAS Institute Inc., Cary, NC USA.
Delwiche LD and Slaughter SJ. (2003) The Little SAS Book: A Primer, 3rd edition. SAS Institute. Cary, NC, USA. [pages 144-157]
Haworth LE (2001) Output Delivery System: The Basics. SAS Institute Inc. Cary, NC USA.
ODS Basics
What is ODS?
* method of delivering output in a variety of formats (other than the default “listing” format”)
* options available include HTML, Rich Text Format (RTF), PS, PDF, SAS data sets
Basic ODS Terminology
“destinations” – locations to which ODS routes output (e.g. LISTING, HTML, RTF, PRINTER, PDF, OUTPUT – new data set)
“objects” – output entities created by ODS to store the formatted results
“styles” – font/color/other attributes of a report
Basic syntax of ODS statements
* identify output objects;ODS TRACE ON </options>;
* open output destination;ODS destination <FILE=filename>;
* create SAS data set with output object;ODS OUTPUT output-object-name=SAS-data-set-name;
* [optional] select particular objects for inclusion;ODS <destination> SELECT output-object-name;
PROC … PROC …
PROC …
ODS <destination> CLOSE;
ODS TRACE OFF;
ODS to different file types
A familiar example
proc format; value totfmt 0='none' 1-HIGH='some' ;
data d1;infile "\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ch2-dat.txt" firstobs=16 expandtabs missover pad ; * infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;input @9 animal 2. @17 conc 3. @25 brood1 2. @33 brood2 2. @41 brood3 2. @49 total 2.;
cbrood3 = brood3;format cbrood3 totfmt.;
label animal = animal ID number;label conc = Nitrofen concentration;label brood1 = number of young in first brood;label brood2 = number of young in 2nd brood;label brood3 = number of young in 3rd brood;label total = total young produced in three broods;
proc print;
where conc=0; run;
/* aside: ODS LISTING open as a default. You can have multiple destinations open simultaneously. If you want to close the LISTING destination before generating output then type ODS LISTING CLOSE; before issuing the PROC for which output is desired.*/
/* generate HTML files with objects from 3 PROCs */
ODS TRACE ON;
* ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example.html’;ODS HTML file="\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\ODS-HTML-example.html”;
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
proc freq;
table conc*cbrood3 / nopct nocol chisq trend exact;
run;
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
ODS TRACE OFF;
/* now generate HTML files with additional linkage info */
ODS TRACE ON;
ODS HTML path='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs’ body = ’day6-example2.html’ /* Output objects */ contents = ‘day6-example2-TOC.html’ /* Table of contents */ frame = ‘day6-example2-frame.html’ /* organizes display */ newfile = NONE; /* all results to one file*/
/* old code where M drive referenced vs. specification of the full pathODS HTML path='M:\public.www\classes\sta402\SAS-programs’ body = ’day6-example2.html’ /* Output objects */ contents = ‘day6-example2-TOC.html’ /* Table of contents */ frame = ‘day6-example2-frame.html’ /* organizes display */ newfile = NONE; /* all results to one file*/*//* comment: by default, opens a new body file for each part of output so the“newfile=NONE” directs all output to the same body file
newfile=PAGE – creates new body file for each page of output
*/
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
proc freq;
table conc*cbrood3 / nopct nocol chisq trend exact;
run;
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
ODS TRACE OFF;
/* select on one of the output objects for inclusion */
*ODS HTML file='M:\public.www\classes\sta402\SAS-programs\day6-example3.html’;
ODS HTML file=”\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example3.html”;
ODS HTML SELECT SSPLOTS;
ODS HTML SHOW; /* write details to SASLOG confirming object sel. */
proc univariate plot; by conc;
var total;
run;
ODS HTML CLOSE;
/* select different destinations */
options orientation=landscape nocenter nodate;
ODS ESCAPECHAR= “^”; /* for fancy formatting later */
/* old program with M drive reference ODS RTF file='M:\public.www\classes\sta402\SAS-programs\day6-example.rtf’;ODS PDF file='M:\public.www\classes\sta402\SAS-programs\day6-example.pdf’;ODS PS file='M:\public.www\classes\sta402\SAS-programs\day6-example.ps’;*/
ODS RTF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.rtf’;ODS PDF file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.pdf’;ODS PS file='\\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\SAS-programs\day6-example.ps’;
Title ‘Plot of number of young vs. Nitrofen concentration^{super a}’;Footnote1 ‘^{super a}s=some young produced in Brood 3, n= no young produced in Brood 3’;
proc plot;
plot total*conc=cbrood3 / vaxis=0 to 40 by 2;
run;
ODS RTF CLOSE;ODS PDF CLOSE;ODS PS CLOSE;
ODS to create output data sets
proc sort data=d1; by conc;
ODS TRACE ON; /* see what ODS objects are created by univariate */proc univariate data=d1; by conc; var total; run;ODS TRACE OFF;
ODS OUTPUT Quantiles=data_quant; /* extract quantiles */proc univariate data=d1; by conc; var total; run;ODS OUTPUT CLOSE;proc print data=data_quant; run; Var Obs conc Name Quantile Estimate
1 0 total 100% Max 36.0 2 0 total 99% 36.0 3 0 total 95% 36.0 4 0 total 90% 35.0 5 0 total 75% Q3 34.0 6 0 total 50% Median 32.5 7 0 total 25% Q1 30.0 8 0 total 10% 25.5 9 0 total 5% 24.0 10 0 total 1% 24.0 11 0 total 0% Min 24.0. . . . . . . . . . . . . edited output . . . . . . . . . . . . . . . . 45 310 total 100% Max 15.0 46 310 total 99% 15.0 47 310 total 95% 15.0 48 310 total 90% 11.0 49 310 total 75% Q3 6.0 50 310 total 50% Median 6.0 51 310 total 25% Q1 5.0 52 310 total 10% 2.0
53 310 total 5% 0.0 54 310 total 1% 0.0 55 310 total 0% Min 0.0
/* can also create multiple data sets */ODS OUTPUT Quantiles(MATCH_ALL=conc_name_macro)=data_quant;proc univariate data=d1; by conc; var total; run;ODS OUTPUT CLOSE;proc print data=data_quant; run;
from the SAS LOG file NOTE: The data set WORK.DATA_QUANT has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=0NOTE: The data set WORK.DATA_QUANT1 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=80NOTE: The data set WORK.DATA_QUANT2 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=160NOTE: The data set WORK.DATA_QUANT3 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=235NOTE: The data set WORK.DATA_QUANT4 has 11 observations and 4 variables.NOTE: The above message was for the following by-group: Nitrofen concentration=310
/* write the data set names to the SAS LOG */%put The conc_name_macro variables contains the following data sets &conc_name_macro;
76 %put The conc_name_macro variables contains the following data sets &conc_name_macro;
The conc_name_macro variables contains the following data sets DATA_QUANT DATA_QUANT1 DATA_QUANT2DATA_QUANT3 DATA_QUANT4&conc_name_macro;
/* merge the concentration summary files to create single table */data c0; set DATA_QUANT; rename Estimate=C0_Est; key=_n_; drop VarName conc;data c80; set DATA_QUANT1; rename Estimate=C80_Est; key=_n_; drop VarName conc;data c160; set DATA_QUANT2; rename Estimate=C160_Est; key=_n_; drop VarName conc;data c235; set DATA_QUANT3; rename Estimate=C235_Est; key=_n_; drop VarName conc;data c310; set DATA_QUANT4; rename Estimate=C310_Est; key=_n_; drop VarName conc;
data all; merge c0 c80 c160 c235 c310; by key; drop key;
proc print data=all; run; Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est
1 100% Max 36.0 36.0 31.0 27.0 15 2 99% 36.0 36.0 31.0 27.0 15 3 95% 36.0 36.0 31.0 27.0 15
4 90% 35.0 35.5 30.5 25.0 11 5 75% Q3 34.0 33.0 30.0 21.0 6 6 50% Median 32.5 32.5 29.0 16.5 6 7 25% Q1 30.0 29.0 27.0 13.0 5 8 10% 25.5 26.5 24.5 9.5 2 9 5% 24.0 26.0 23.0 7.0 0 10 1% 24.0 26.0 23.0 7.0 0 11 0% Min 24.0 26.0 23.0 7.0 0
/* extract the rows-observations corresponding to the 5 number summary */data fivenum; set all; if _n_=1 or _n_=5 or _n_=6 or _n_=7 or _n_=11;proc print; run; Obs Quantile C0_Est C80_Est C160_Est C235_Est C310_Est
1 100% Max 36.0 36.0 31.0 27.0 15 2 75% Q3 34.0 33.0 30.0 21.0 6 3 50% Median 32.5 32.5 29.0 16.5 6 4 25% Q1 30.0 29.0 27.0 13.0 5 5 0% Min 24.0 26.0 23.0 7.0 0
Using ODS OUTPUT to create dataset in a simulation
/* Extracting coefficients from simple linear regression simulation*/
options formdlim="-" nodate;
/* generate simulation data sets Y ~ N(mu(x)= 3+2x, sigma=2) */
data sims; do dataset=1 to 1000; do x=1 to 10;
y = 3 + 2*x + 2*rannor(0); output;end;
end;
/* DEBUG: print to check generated data */proc print data=sims; run;
/* SORT for data set */proc sort data=sims; by dataset; run;/* USE OUTEST to extract the estimated coefficients */proc reg data=sims outest=myparms; by dataset;
model y=x; run;
proc print data=myparms; run;
/* HISTOGRAM for estimated slope */proc gchart data=work.myparms; vbar x; run;
/* Re-do this with ODS */*ods trace on; * determine what output objects are constructed;
ods output ParameterEstimates=reg_coefs;
proc reg data=sims; by dataset; model y=x; run;
proc print data=reg_coefs; run;
ods output close;
*ods trace off;
proc print data=reg_coefs; run;
proc contents data=reg_coefs; run;
data slopes; set reg_coefs; if Variable="x"; slope=Estimate; keep dataset slope;
data intercepts; set reg_coefs; if Variable="Intercept"; Intercept = Estimate; keep dataset intercept;
data both; merge slopes intercepts; by dataset;proc gplot data=both;title "Plot of estimated slope vs. estimated intercept"; plot slope*intercept;
run;
proc gchart data=both;title "Sampling distribution of the estimated slope"; vbar slope; run;
proc gchart data=bothtitle "Sampling distribution of the estimated intercept"; vbar intercept; run;
proc print data=slopes; run;
PROC TABULATE (producing fancier results tables in SAS)
PROC TABULATE <option(s)>; CLASS variable(s) </ options>; * identify non-numeric vars; FREQ variable; * identify variable containing frequency of observation; TABLE <<page-expression,> row-expression,> column-expression</ table-option(s)>; VAR analysis-variable(s)</ options>; * identify analysis vars; WEIGHT variable; * identify variable name – e.g. sampling wts;
* FORMATTING related subcommands …CLASSLEV variable(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >; KEYLABEL keyword-1='description-1' <...keyword-n='description-n'>; KEYWORD keyword(s) / STYLE=<style-element-name | PARENT> <[style-attribute-specification(s)] >; [* check out results of search for “Tabulate syntax” on www.muohio.edu/quantapps SAS doc]
Comments:
* concatenation (blank) operator* crossing (*) operator* format modifiers* grouping elements (parentheses) operator* ALL class variable
data d1; infile 'M:\public.www\classes\sta402\SAS-programs\ch2-dat.txt' firstobs=16 expandtabs missover pad ;input @9 animal 2. @17 conc 3. @25 brood1 2. @33 brood2 2. @41 brood3 2. @49 total 2.;
proc tabulate data=d1; class conc; var brood1 brood2 brood3 total; table (brood1 brood2 brood3 total)*conc, min q1 median q3 max; run;
proc tabulate data=d1; class conc; var total; table conc=”Nitrofen Concentration” all, total (mean var); run;
Week 04+/- [12+ Sept.] Class Activities
AN INTRODUCTION TO STATISTICAL MODELING
* PROC REG for linear modeling (a very basic introduction)* PROC GLM for anova models
Other normal response modelingANOVA – balanced anova models
Non-normal response modelingGENMOD – generalized linear modelsLOGISTIC – [grouped] binary regressionPROBIT – [grouped] binary regression (INVERSECL)CATMOD – categorical data modeling
Failure time modelingLIFEREG – accelerated failure time modelsPHREG – Cox’s PH model
And more …
REGRESSION using PROC REG
Basic Model: Yi = 0 +1Xi + i [“simple linear regression”]
= 0 +1 Xi1 +2 Xi2 +3Xi3 +4 Xi4 +5Xi5 + ij [“multiple linear regression”]
Error Assumption:ij ~ indep. N(0, 2)
i=1,2,…,n [observations]
/* example sas program that does simple linear regression*/
options ls=75;
data example1; input year nboats manatees; cards;77 447 1378 460 2179 481 2480 498 1681 513 2482 512 2083 526 1584 559 3485 585 3386 614 3387 645 3988 675 4389 711 5090 719 47;/* WARNING: ODS RTF will place TITLE information along With SAS date/time/page number as part of a header in the RTP document. Check out Print Preview or view the header.*/ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\linreg-output.rtf’;
proc reg;title ‘Number of Manatees killed regressed on the number of boats registered in Florida’; model manatees = nboats / p r cli clm; plot manatees*nboats=”o” p.*nboats=”+” / overlay; plot r.*nboats r.*p.;run;
ODS RTF CLOSE;
Analysis of Variance
Source DFSum of
SquaresMean
Square F Value Pr > F
Model 1 1711.97866 1711.97866 93.61 <.0001
Error 12 219.44991 18.28749
Corrected Total 13 1931.42857
Root MSE 4.27639 R-Square 0.8864
Dependent Mean 29.42857 Adj R-Sq 0.8769
Coeff Var 14.53141
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 -41.43044 7.41222 -5.59 0.0001
nboats 1 0.12486 0.01290 9.68 <.0001
Output Statistics
ObsDep Var
manateesPredicted
ValueStd Error
Mean Predict 95% CL Mean 95% CL Predict ResidualStd ErrorResidual
StudentResidual
1 13.0000 14.3827 1.9299 10.1779 18.5876 4.1604 24.6050 -1.3827 3.816 -0.362
2 21.0000 16.0059 1.7974 12.0896 19.9222 5.8989 26.1130 4.9941 3.880 1.287
3 24.0000 18.6280 1.5976 15.1472 22.1089 8.6816 28.5745 5.3720 3.967 1.354
4 16.0000 20.7507 1.4528 17.5853 23.9161 10.9102 30.5911 -4.7507 4.022 -1.181
5 24.0000 22.6236 1.3420 19.6997 25.5475 12.8582 32.3891 1.3764 4.060 0.339
6 20.0000 22.4987 1.3488 19.5600 25.4375 12.7288 32.2687 -2.4987 4.058 -0.616
7 15.0000 24.2468 1.2622 21.4968 26.9968 14.5320 33.9616 -9.2468 4.086 -2.263
8 34.0000 28.3672 1.1482 25.8656 30.8689 18.7198 38.0147 5.6328 4.119 1.367
9 33.0000 31.6137 1.1650 29.0753 34.1520 21.9566 41.2707 1.3863 4.115 0.337
10 33.0000 35.2346 1.2909 32.4221 38.0472 25.5019 44.9673 -2.2346 4.077 -0.548
11 39.0000 39.1054 1.5187 35.7963 42.4144 29.2178 48.9929 -0.1054 3.998 -0.0264
12 43.0000 42.8512 1.7974 38.9349 46.7675 32.7442 52.9582 0.1488 3.880 0.0383
13 50.0000 47.3462 2.1762 42.6048 52.0877 36.8917 57.8007 2.6538 3.681 0.721
14 47.0000 48.3451 2.2647 43.4109 53.2794 37.8018 58.8884 -1.3451 3.628 -0.371
Output Statistics
Obs -2-1 0 1 2Cook's
D
1 | | |
0.017
2 | |** |
0.178
3 | |** |
0.149
4 | **| |
0.091
5 | | |
0.006
6 | *| |
0.021
7 | ****| |
0.244
8 | |** |
0.073
9 | | |
0.005
10 | *| |
0.015
Output Statistics
Obs -2-1 0 1 2Cook's
D
11 | | |
0.000
12 | | |
0.000
13 | |* |
0.091
14 | | |
0.027
Sum of Residuals 0
Sum of Squared Residuals 219.44991
Predicted Residual SS (PRESS) 281.76275
Multiple Regression with indicator variables
Log(Brain Weight)i = 0 +1 Log(Body Weight)i + i
Log(Brain Weight)i = 0 +1 Log(Body Weight)i +2 Idinoi + i
Log(Brain Weight)i = 0 +1 Log(Body Weight)i +2 Idinoi + 3 Idinoi ×Log(Body Weight)i +i
data mrexample;* Lunneborg (1994);* body weight brain example; input species $ bodywt brainwt @@; logbody = log10(bodywt); logbrain = log10(brainwt); idino = 0; if (species="diplodoc" or species="tricerat" or species="brachios") then idino=1; idinobod = idino*logbody; cards;beaver 1.35 8.10 cow 465.00 423.00 wolf 36.33 119.50 goat 27.66 115.00guipig 1.04 5.50 diplodocus 11700.00 50.00 asielephant 2547.00 4603.00donkey 187.10 419.00 horse 521.00 655.00 potarmonkey 10.00 115.00cat 3.30 25.60 giraffe 529.000 680.00 gorilla 207.00 406.00human 62.00 1320.00 afrelephant 6654.00 5712.00 triceratops 9400.00 70.00rhemonkey 6.80 179.00 kangaroo 35.00 56.00 hamster 0.12 1.00mouse 0.023 0.40 rabbit 2.50 12.10 sheep 55.50 175.00 jaguar 100.00 157.00 chimp 52.16 440.00 brachiosaurus 87000.00 154.50rat 0.28 1.90 mole 0.122 3.00 pig 192.00 180;
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\mreg-output.rtf’;
proc print;title ‘brain wt - body wt data’;run;
proc univariate; var bodywt brainwt;
id species; run;
proc reg;title2 ‘allometric scaling - brain and body wt.’;
title3 ‘[All Species combined]’; model logbrain=logbody; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;
proc reg;title2 ‘Dinosaurs fitted with potentially different line’; model logbrain=logbody idino idinobod; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;
proc reg;title2 ‘Dinosaurs fitted with potentially different INTERCEPTS’; model logbrain=logbody idino; plot logbrain*logbody="o" p.*logbody="+" / overlay; plot r.*logbody;run;
ODS RTF CLOSE;
Obs species bodywt brainwt logbody logbrain idino idinobod
1 beaver 1.35 8.1 0.13033 0.90849 0 0.00000
2 cow 465.00 423.0 2.66745 2.62634 0 0.00000
3 wolf 36.33 119.5 1.56027 2.07737 0 0.00000
4 goat 27.66 115.0 1.44185 2.06070 0 0.00000
5 guipig 1.04 5.5 0.01703 0.74036 0 0.00000
6 diplodoc 11700.00 50.0 4.06819 1.69897 1 4.06819
7 asieleph 2547.00 4603.0 3.40603 3.66304 0 0.00000
8 donkey 187.10 419.0 2.27207 2.62221 0 0.00000
9 horse 521.00 655.0 2.71684 2.81624 0 0.00000
10 potarmon 10.00 115.0 1.00000 2.06070 0 0.00000
11 cat 3.30 25.6 0.51851 1.40824 0 0.00000
Obs species bodywt brainwt logbody logbrain idino idinobod
12 giraffe 529.00 680.0 2.72346 2.83251 0 0.00000
13 gorilla 207.00 406.0 2.31597 2.60853 0 0.00000
14 human 62.00 1320.0 1.79239 3.12057 0 0.00000
15 afreleph 6654.00 5712.0 3.82308 3.75679 0 0.00000
16 tricerat 9400.00 70.0 3.97313 1.84510 1 3.97313
17 rhemonke 6.80 179.0 0.83251 2.25285 0 0.00000
18 kangaroo 35.00 56.0 1.54407 1.74819 0 0.00000
19 hamster 0.12 1.0 -0.92082 0.00000 0 0.00000
20 mouse 0.02 0.4 -1.63827 -0.39794 0 0.00000
21 rabbit 2.50 12.1 0.39794 1.08279 0 0.00000
22 sheep 55.50 175.0 1.74429 2.24304 0 0.00000
23 jaguar 100.00 157.0 2.00000 2.19590 0 0.00000
24 chimp 52.16 440.0 1.71734 2.64345 0 0.00000
25 brachios 87000.00 154.5 4.93952 2.18893 1 4.93952
26 rat 0.28 1.9 -0.55284 0.27875 0 0.00000
27 mole 0.12 3.0 -0.91364 0.47712 0 0.00000
28 pig 192.00 180.0 2.28330 2.25527 0 0.00000
brain wt - body wt data
The UNIVARIATE ProcedureVariable: bodywt
BODY WEIGHTMoments
N 28 Sum Weights 28
Mean 4278.43875 Sum Observations 119796.285
Std Deviation 16480.4904 Variance 271606563
Skewness 5.03388585 Kurtosis 26.0100719
Uncorrected SS 7845918273 Corrected SS 7333377205
Coeff Variation 385.198698 Std Error Mean 3114.51993
Basic Statistical Measures
Location Variability
Mean 4278.439 Std Deviation 16480
Median 53.830 Variance 271606563
Mode . Range 87000
Interquartile Range 490.10000
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 1.373707 Pr > |t| 0.1808
Sign M 14 Pr >= |M| <.0001
Signed Rank S 203 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 87000.000
99% 87000.000
95% 11700.000
90% 9400.000
75% Q3 493.000
50% Median 53.830
25% Q1 2.900
10% 0.122
5% 0.120
brain wt - body wt data
The UNIVARIATE ProcedureVariable: bodywt
Quantiles (Definition 5)
Quantile Estimate
1% 0.023
0% Min 0.023
Extreme Observations
Lowest Highest
Value species Obs Value species Obs
0.023 mouse 20 2547 asieleph 7
0.120 hamster 19 6654 afreleph 15
0.122 mole 27 9400 tricerat 16
0.280 rat 26 11700 diplodoc 6
1.040 guipig 5 87000 brachios 25
brain wt - body wt data
The UNIVARIATE ProcedureVariable: brainwt
BRAIN WEIGHTMoments
N 28 Sum Weights 28
Mean 574.521429 Sum Observations 16086.6
Std Deviation 1334.92919 Variance 1782035.94
Skewness 3.33453913 Kurtosis 10.6457044
Uncorrected SS 57357066.9 Corrected SS 48114970.5
Coeff Variation 232.354987 Std Error Mean 252.277904
Basic Statistical Measures
Location Variability
Mean 574.5214 Std Deviation 1335
Median 137.0000 Variance 1782036
Mode 115.0000 Range 5712
Interquartile Range 402.15000
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 2.277336 Pr > |t| 0.0309
Sign M 14 Pr >= |M| <.0001
Signed Rank S 203 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 5712.00
99% 5712.00
95% 4603.00
90% 1320.00
75% Q3 421.00
50% Median 137.00
25% Q1 18.85
10% 1.90
5% 1.00
brain wt - body wt data
The UNIVARIATE ProcedureVariable: brainwt
Quantiles (Definition 5)
Quantile Estimate
1% 0.40
0% Min 0.40
Extreme Observations
Lowest Highest
Value species Obs Value species Obs
0.4 mouse 20 655 horse 9
1.0 hamster 19 680 giraffe 12
1.9 rat 26 1320 human 14
3.0 mole 27 4603 asieleph 7
5.5 guipig 5 5712 afreleph 15
Analysis of Variance
Source DFSum of
SquaresMean
Square F Value Pr > F
Model 1 17.81230 17.81230 40.26 <.0001
Error 26 11.50305 0.44242
Corrected Total 27 29.31535
Root MSE 0.66515 R-Square 0.6076
Dependent Mean 1.92195 Adj R-Sq 0.5925
Coeff Var 34.60816
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 1.10958 0.17942 6.18 <.0001
logbody 1 0.49599 0.07817 6.35 <.0001
Analysis of Variance
Source DFSum of
SquaresMean
Square F Value Pr > F
Model 3 27.01211 9.00404 93.82 <.0001
Error 24 2.30324 0.09597
Corrected Total 27 29.31535
Root MSE 0.30979 R-Square 0.9214
Dependent Mean 1.92195 Adj R-Sq 0.9116
Coeff Var 16.11844
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 0.93391 0.08562 10.91 <.0001
logbody 1 0.75226 0.04493 16.74 <.0001
idino 1 -0.91748 1.79054 -0.51 0.6131
idinobod 1 -0.31441 0.41371 -0.76 0.4547
Analysis of Variance
Source DFSum of
SquaresMean
Square F Value Pr > F
Model 2 26.95668 13.47834 142.86 <.0001
Error 25 2.35867 0.09435
Corrected Total 27 29.31535
Root MSE 0.30716 R-Square 0.9195
Dependent Mean 1.92195 Adj R-Sq 0.9131
Coeff Var 15.98167
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 0.93879 0.08465 11.09 <.0001
logbody 1 0.74855 0.04428 16.90 <.0001
idino 1 -2.26674 0.23024 -9.84 <.0001
One-way ANOVA
Basic Model: Yij = i + ij [“cell means” coding]= + i + ij [“effects” coding]
(constraint for estimation? 1=0 or g=0 or i=0)
Error Assumption:ij ~ indep. N(0, 2)
i=1,2,…,g [treatments or populations]j=1,2,…,ni [replications]
H0: 1 = … = g or equivalently, H0: 1 = … = g=0
/* Bacteria in meat under 4 different conditions */
options ls = 75;
data meat;
input condition $ logcount @@; datalines;Plastic 7.66 Plastic 6.98 Plastic 7.80Vacuum 5.26 Vacuum 5.44 Vacuum 5.80Mixed 7.41 Mixed 7.33 Mixed 7.04Co2 3.51 Co2 2.91 Co2 3.66;
title bacteria growth under 4 packaging conditions;
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\oneway-output.rtf’;
proc boxplot; plot logcount*condition;run;
proc glm data=meat order=data;title2 fitting the one-way anova model via GLM; class condition; model logcount = condition; means condition / bon tukey scheffe cldiff lines; lsmeans condition / cl pdiff; contrast ‘plastic vs. rest’ condition 3 –1 –1 –1; output out=new p=yhat r=resid stdr=eresid;run;
proc plot data=new;title2 residual analyses; plot resid*yhat;run;
proc univariate data=new plot; var resid;run;
proc boxplot data=new; plot resid*condition;run;
ODS RTF CLOSE;
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Class Level Information
Class Levels Values
condition 4 Plastic Vacuum Mixed Co2
Number of observations 12
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Dependent Variable: logcount
Source DFSum of
Squares Mean Square F Value Pr > F
Model 3 32.87280000 10.95760000 94.58 <.0001
Error 8 0.92680000 0.11585000
Corrected Total 11 33.79960000
R-Square Coeff Var Root MSE logcount Mean0.972580 5.768940 0.340367 5.900000
Source DF Type I SS Mean Square F Value Pr > F
condition 3 32.87280000 10.95760000 94.58 <.0001
Source DF Type III SS Mean Square F Value Pr > F
condition 3 32.87280000 10.95760000 94.58 <.0001
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Tukey's Studentized Range (HSD) Test for logcount
NOTE: This test controls the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of Studentized Range 4.52880
Minimum Significant Difference 0.89
Comparisons significant at the 0.05 level are indicated by ***.
conditionComparison
DifferenceBetween
Means
Simultaneous 95%
Confidence Limits
Plastic - Mixed 0.2200 -0.6700 1.1100
Plastic - Vacuum 1.9800 1.0900 2.8700 ***
Plastic - Co2 4.1200 3.2300 5.0100 ***
Mixed - Plastic -0.2200 -1.1100 0.6700
Mixed - Vacuum 1.7600 0.8700 2.6500 ***
Mixed - Co2 3.9000 3.0100 4.7900 ***
Vacuum - Plastic -1.9800 -2.8700 -1.0900 ***
Vacuum - Mixed -1.7600 -2.6500 -0.8700 ***
Vacuum - Co2 2.1400 1.2500 3.0300 ***
Co2 - Plastic -4.1200 -5.0100 -3.2300 ***
Co2 - Mixed -3.9000 -4.7900 -3.0100 ***
Co2 - Vacuum -2.1400 -3.0300 -1.2500 ***
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Bonferroni (Dunn) t Tests for logcount
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of t 3.47888
Minimum Significant Difference 0.9668
Comparisons significant at the 0.05 level are indicated by ***.
conditionComparison
DifferenceBetween
Means
Simultaneous 95%
Confidence Limits
Plastic - Mixed 0.2200 -0.7468 1.1868
Plastic - Vacuum 1.9800 1.0132 2.9468 ***
Plastic - Co2 4.1200 3.1532 5.0868 ***
Mixed - Plastic -0.2200 -1.1868 0.7468
Mixed - Vacuum 1.7600 0.7932 2.7268 ***
Mixed - Co2 3.9000 2.9332 4.8668 ***
Vacuum - Plastic -1.9800 -2.9468 -1.0132 ***
Vacuum - Mixed -1.7600 -2.7268 -0.7932 ***
Vacuum - Co2 2.1400 1.1732 3.1068 ***
Co2 - Plastic -4.1200 -5.0868 -3.1532 ***
Co2 - Mixed -3.9000 -4.8668 -2.9332 ***
Co2 - Vacuum -2.1400 -3.1068 -1.1732 ***
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Scheffe's Test for logcount
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of F 4.06618
Minimum Significant Difference 0.9706
Comparisons significant at the 0.05 level are indicated by ***.
conditionComparison
DifferenceBetween
Means
Simultaneous 95%
Confidence Limits
Plastic - Mixed 0.2200 -0.7506 1.1906
Plastic - Vacuum 1.9800 1.0094 2.9506 ***
Plastic - Co2 4.1200 3.1494 5.0906 ***
Mixed - Plastic -0.2200 -1.1906 0.7506
Mixed - Vacuum 1.7600 0.7894 2.7306 ***
Mixed - Co2 3.9000 2.9294 4.8706 ***
Vacuum - Plastic -1.9800 -2.9506 -1.0094 ***
Vacuum - Mixed -1.7600 -2.7306 -0.7894 ***
Vacuum - Co2 2.1400 1.1694 3.1106 ***
Co2 - Plastic -4.1200 -5.0906 -3.1494 ***
Co2 - Mixed -3.9000 -4.8706 -2.9294 ***
Co2 - Vacuum -2.1400 -3.1106 -1.1694 ***
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Tukey's Studentized Range (HSD) Test for logcount
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of Studentized Range 4.52880
Minimum Significant Difference 0.89
Means with the same letter are not significantly different.
Tukey Grouping Mean N condition
A 7.4800 3 Plastic
A
A 7.2600 3 Mixed
B 5.5000 3 Vacuum
C 3.3600 3 Co2
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Bonferroni (Dunn) t Tests for logcount
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of t 3.47888
Minimum Significant Difference 0.9668
Means with the same letter are not significantly different.
Bon Grouping Mean N condition
A 7.4800 3 Plastic
A
A 7.2600 3 Mixed
B 5.5000 3 Vacuum
C 3.3600 3 Co2
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Scheffe's Test for logcount
NOTE: This test controls the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 8
Error Mean Square 0.11585
Critical Value of F 4.06618
Minimum Significant Difference 0.9706
Means with the same letter are not significantly different.
Scheffe Grouping Mean N condition
A 7.4800 3 Plastic
A
A 7.2600 3 Mixed
B 5.5000 3 Vacuum
C 3.3600 3 Co2
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM ProcedureLeast Squares Means
conditionlogcount
LSMEANLSMEAN
Number
Plastic 7.48000000 1
Vacuum 5.50000000 2
Mixed 7.26000000 3
Co2 3.36000000 4
Least Squares Means for effect conditionPr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: logcount
i/j 1 2 3 4
1 <.0001 0.4514 <.0001
2 <.0001 0.0002 <.0001
3 0.4514 0.0002 <.0001
4 <.0001 <.0001 <.0001
conditionlogcount
LSMEAN95% Confidence
Limits
Plastic 7.480000 7.026844 7.933156
Vacuum 5.500000 5.046844 5.953156
Mixed 7.260000 6.806844 7.713156
Co2 3.360000 2.906844 3.813156
Least Squares Means for Effect condition
i j
Difference Between
Means95% Confidence Limits
for LSMean(i)-LSMean(j)
1 2 1.980000 1.339141 2.620859
1 3 0.220000 -0.420859 0.860859
1 4 4.120000 3.479141 4.760859
2 3 -1.760000 -2.400859 -1.119141
2 4 2.140000 1.499141 2.780859
3 4 3.900000 3.259141 4.540859
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM ProcedureLeast Squares Means
NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
bacteria growth under 4 packaging conditionsfitting the one-way anova model via GLM
The GLM Procedure
Dependent Variable: logcount
Contrast DF Contrast SS Mean Square F Value Pr > F
plastic vs. rest 1 9.98560000 9.98560000 86.19 <.0001
bacteria growth under 4 packaging conditionsresidual analyses
Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs, etc. resid ‚ ‚ 0.4 ˆ ‚ ‚ ‚ A 0.3 ˆ A A ‚ ‚ ‚ 0.2 ˆ ‚ A ‚ A A ‚ 0.1 ˆ ‚ A ‚ ‚ 0.0 ˆ ‚ ‚ A ‚ -0.1 ˆ ‚ ‚ ‚ -0.2 ˆ ‚ A ‚ A ‚ -0.3 ˆ ‚ ‚ ‚ -0.4 ˆ ‚ ‚ A ‚ -0.5 ˆ A ‚ Šƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒ 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 yhat
bacteria growth under 4 packaging conditionsresidual analyses
The UNIVARIATE ProcedureVariable: resid
Moments
N 12 Sum Weights 12
Mean 0 Sum Observations 0
Std Deviation 0.29026634 Variance 0.08425455
Skewness -0.6294875 Kurtosis -0.971163
Uncorrected SS 0.9268 Corrected SS 0.9268
Coeff Variation . Std Error Mean 0.08379267
Basic Statistical Measures
Location Variability
Mean 0.000000 Std Deviation 0.29027
Median 0.110000 Variance 0.08425
Mode 0.300000 Range 0.82000
Interquartile Range 0.47000
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 0 Pr > |t| 1.0000
Sign M 1 Pr >= |M| 0.7744
Signed Rank S 2 Pr >= |S| 0.8931
Quantiles (Definition 5)
Quantile Estimate
100% Max 0.32
99% 0.32
95% 0.32
90% 0.30
75% Q3 0.24
50% Median 0.11
25% Q1 -0.23
10% -0.45
5% -0.50
bacteria growth under 4 packaging conditionsresidual analyses
The UNIVARIATE ProcedureVariable: resid
Quantiles (Definition 5)
Quantile Estimate
1% -0.50
0% Min -0.50
Extreme Observations
Lowest Highest
Value Obs Value Obs
-0.50 2 0.15 7
-0.45 11 0.18 1
-0.24 4 0.30 6
-0.22 9 0.30 12
-0.06 5 0.32 3
Stem Leaf # Boxplot 3 002 3 | 2 +-----+ 1 558 3 *-----* 0 7 1 | + | -0 6 1 | | -1 | | -2 42 2 +-----+ -3 | -4 5 1 | -5 0 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Normal Probability Plot 0.35+ * *++ * | ++++ | * *+*+ 0.05+ *++++ | *++ | +++ -0.25+ ++*+ * | +++ | +*++ * -0.55+ +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
Factorial anova model
Basic Model: Yijk = ij + ijk [“cell means” coding]= + i + j + (ij + ijk [“effects” coding]
Error Assumption:ijk ~ indep. N(0, 2)
i=1,2,…,g [treatments or populations]j=1,2,…,ni [replications]
H0: allij =0 [no interaction]
H0: alli =0 [no A main effect]H0: allj =0 [no B main effect]
ODS ESCAPECHAR= “^”; /* for fancy formatting later */
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\twoway-output.rtf’;
title Two-way ANOVA/ Factorial example 2 - interaction plots;title2 Patient Waiting Time data;data cwait; input doctype $ practype $ time @@; cards; gen group 15 gen group 20 gen group 25 gen group 20 gen solo 20 gen solo 25 gen solo 30 gen solo 25spec group 30 spec group 25 spec group 30 spec group 35spec solo 25 spec solo 20 spec solo 30 spec solo 30;proc print;
proc sort; by doctype practype;proc means noprint; by doctype practype; output out=factmean mean=timemean;proc plot data=factmean; plot timemean*doctype=practype / vaxis=0 to 35 by 5;
proc glm data=cwait order=data; class doctype practype; model time=doctype|practype;/* equivalent model statement model time=doctype practype doctype*practype;
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
*/ output out=new p=yhat r=resid; lsmeans doctype practype doctype*practype / stderr pdiff; means doctype practype doctype*practype / tukey; run; ODS RTF CLOSE;
Obs doctype practype time
1 gen group 152 gen group 203 gen group 254 gen group 205 gen solo 206 gen solo 257 gen solo 308 gen solo 259 spec group 30
10 spec group 2511 spec group 3012 spec group 3513 spec solo 2514 spec solo 2015 spec solo 3016 spec solo 30
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
Plot of timemean*doctype. Symbol is value of practype. timemean ‚ 35 ˆ ‚ ‚ ‚ ‚ ‚ 30 ˆ g ‚ ‚ ‚ ‚ s ‚ 25 ˆ s ‚ ‚ ‚ ‚ ‚ 20 ˆ g ‚ ‚ ‚ ‚ ‚ 15 ˆ ‚ ‚ ‚ ‚ ‚ 10 ˆ ‚ ‚ ‚ ‚ ‚ 5 ˆ ‚ ‚ ‚ ‚ ‚ 0 ˆ ‚ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ gen spec doctype
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Class Level Information
Class Levels Values
doctype 2 gen specpractype 2 group
solo
Number of observations 16
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Dependent Variable: time
Source DFSum of
Squares Mean Square F Value Pr > FModel 3 204.6875000 68.2291667 3.74 0.0415
Error 12 218.7500000 18.2291667
Corrected Total 15 423.4375000
R-Square Coeff Var Root MSE time Mean
0.483395 16.86741 4.269563 25.31250
Source DF Type I SS Mean Square F Value Pr > Fdoctype 1 126.5625000 126.5625000 6.94 0.0218
practype 1 1.5625000 1.5625000 0.09 0.7747
doctype*practype 1 76.5625000 76.5625000 4.20 0.0629
Source DF Type III SS Mean Square F Value Pr > Fdoctype 1 126.5625000 126.5625000 6.94 0.0218
practype 1 1.5625000 1.5625000 0.09 0.7747
doctype*practype 1 76.5625000 76.5625000 4.20 0.0629
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM ProcedureLeast Squares Means
doctype time LSMEANStandard
Error
H0:LSMEAN=0 H0:LSMean1=LSMean2
Pr > |t| Pr > |t|Gen 22.5000000 1.5095184 <.0001 0.0218
Spec 28.1250000 1.5095184 <.0001
practype time LSMEANStandard
Error
H0:LSMEAN=0 H0:LSMean1=LSMean2
Pr > |t| Pr > |t|Group 25.0000000 1.5095184 <.0001 0.7747
solo 25.6250000 1.5095184 <.0001
doctype practype time LSMEANStandard
Error Pr > |t|LSMEAN
Number
gen group 20.0000000 2.1347814 <.0001 1
gen solo 25.0000000 2.1347814 <.0001 2
spec group 30.0000000 2.1347814 <.0001 3
spec solo 26.2500000 2.1347814 <.0001 4
Least Squares Means for effect doctype*practypePr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: time
i/j 1 2 3 4
1 0.1236 0.0062 0.06072 0.1236 0.1236 0.68613 0.0062 0.1236 0.23794 0.0607 0.6861 0.2379
NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 12
Error Mean Square 18.22917
Critical Value of Studentized Range 3.08132
Minimum Significant Difference 4.6513
Means with the same letter are not significantly different.
Tukey Grouping Mean N doctype
A 28.125 8 spec
B 22.500 8 gen
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 12
Error Mean Square 18.22917
Critical Value of Studentized Range 3.08132
Minimum Significant Difference 4.6513
Means with the same letter are not significantly different.
Tukey Grouping Mean N practype
A 25.625 8 solo
A
A 25.000 8 group
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
Level ofdoctype
Level ofpractype N
time
Mean Std Dev
gen group 4 20.0000000 4.08248290
gen solo 4 25.0000000 4.08248290
spec group 4 30.0000000 4.08248290
spec solo 4 26.2500000 4.78713554
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
Bonus Material – if you end up using SAS on the Cluster
Unix day!
Create a file containing the following SAS code in your favorite word processor.
/* collin-sim.sas; first version: collin-s96.sas constructed to illustrate collinearity in regression class*/options ls=74;title analyses illustrating co-linearity in multiple regression;data d1; input x y @@; x2 = x + ranuni(0); x3 = x - ranuni(0);cards;10.0 8.04 8.0 6.95 13.0 7.58 9.0 8.81 11.0 8.33 14.0 9.96 6.0 7.24 4.0 4.26 12.0 10.84 7.0 4.82 5.0 5.68;proc print; run;proc reg; model y = x; run;proc corr; var x x2 x3; run;proc reg data=d1; model y = x x2 x3 / r influence tol vif collinoint xpx i; output out=p1out r=resid p=pred; run; MacOS1. start "X11" application - look in ../applications/utilities2. ssh -X [email protected]. R ... sas ... em (enterprise miner)
(file transfer visa "fugu" in applications)
Windows OS1. "putty" to start terminal
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
2. "Xmng" to display X graphics3. "WinSCP" to transfer files ...
1. Start up a "ssh" or "putty" [old days … TELNET or XTERM] client
and log in to the redhawk.hpc.muohio.edu cluster [need account]
2. Basic UNIX shell commands – WARNING: UNIX CaSe-SEnsITive
“passwd” - change your password
“man” – get “manuals” /help for a particular function (e.g. man ls) “ls” – list the files in your current working directory (e.g. ls –l)
“rm” – deletes a file
“mv” – copies a file (e.g. mv old.sas new.sas)
“cat” – concatenate and display file (e.g. “cat <filename>” prints file; “cat file1 file2 > file3” concatenates file1 and file2 and writes result to file3) [> directs output in this example]
“grep” – search a file for a pattern (get regular expression)
“pwd” – print name of the current working directory
“mkdir” – create a directory in the current working directory (e.g. mkdir sta402)
“rmdir” – remove a directory
“cd” – change directory (e.g. cd sta402 to move into the “sta402”; cd .. to move up one level in the directory tree)
“more <filename>” - display the contents of file “filename” (space bar = next page; b = previous page; q = quit more; /name = search for name)
“head <filename>” – display the first few lines of a file
“tail <filename>” – display the last few lines of a file
Two-way ANOVA/ Factorial example 2 - interaction plotsPatient Waiting Time data
The GLM Procedure
Tukey's Studentized Range (HSD) Test for time
“sas <filename.sas>” – causes SAS to execute commands in “filename.sas” in batch mode
“Splus” – starts S-Plus (provides “>” prompt; q() = quit S+)“exit” or “logout” to end session
* can link commands together (“piping”)
ls –l | morels –l | grep “drwx”
* deleting characters - <ctrl>-u deletes all lines to left of current position
* can recall earlier commands using arrow keys
* “history” gives a list of all commands issued – select a numbered command by “!<number>”
* can edit previous commands
ls –l | grep “Sp”
^ep^Sep
3. Log onto redhawk.hpc.muohio.edu (via a ssh/putty session)a. change your passwordb. create a directory for this class (mkdir sta402 or mkdir sta502 or …)
4. FTP onto unixgen.muohio.edu
a. cd sta402b. “put” the SAS file that you created earlierc. quit ftp
5. Run SAS on the commands in the file that you transferred.
a. Type “sas filename” (omit the .sas extension)b. This will create a log file (filename.log) and a listing file (filename.lst)c. Use more to check the filename.log and the filename.lst file.